Symbolic expression generation via variational auto-encoder

There are many problems in physics, biology, and other natural sciences in which symbolic regression can provide valuable insights and discover new laws of nature. Widespread deep neural networks do not provide interpretable solutions. Meanwhile, symbolic expressions give us a clear relation between...

Full description

Bibliographic Details
Main Authors:	Sergei Popov, Mikhail Lazarev, Vladislav Belavin, Denis Derkach, Andrey Ustyuzhanin
Format:	Article
Language:	English
Published:	PeerJ Inc. 2023-03-01
Series:	PeerJ Computer Science
Subjects:	Symbolic regression VAE LSTM Constrained optimization Generation Machine learning
Online Access:	https://peerj.com/articles/cs-1241.pdf

_version_	1811155609976832000
author	Sergei Popov Mikhail Lazarev Vladislav Belavin Denis Derkach Andrey Ustyuzhanin
author_facet	Sergei Popov Mikhail Lazarev Vladislav Belavin Denis Derkach Andrey Ustyuzhanin
author_sort	Sergei Popov
collection	DOAJ
description	There are many problems in physics, biology, and other natural sciences in which symbolic regression can provide valuable insights and discover new laws of nature. Widespread deep neural networks do not provide interpretable solutions. Meanwhile, symbolic expressions give us a clear relation between observations and the target variable. However, at the moment, there is no dominant solution for the symbolic regression task, and we aim to reduce this gap with our algorithm. In this work, we propose a novel deep learning framework for symbolic expression generation via variational autoencoder (VAE). We suggest using a VAE to generate mathematical expressions, and our training strategy forces generated formulas to fit a given dataset. Our framework allows encoding apriori knowledge of the formulas into fast-check predicates that speed up the optimization process. We compare our method to modern symbolic regression benchmarks and show that our method outperforms the competitors under noisy conditions. The recovery rate of SEGVAE is 65% on the Ngyuen dataset with a noise level of 10%, which is better than the previously reported SOTA by 20%. We demonstrate that this value depends on the dataset and can be even higher.
first_indexed	2024-04-10T04:37:19Z
format	Article
id	doaj.art-51faf45d794541d186fcbec567cc31ec
institution	Directory Open Access Journal
issn	2376-5992
language	English
last_indexed	2024-04-10T04:37:19Z
publishDate	2023-03-01
publisher	PeerJ Inc.
record_format	Article
series	PeerJ Computer Science
spelling	doaj.art-51faf45d794541d186fcbec567cc31ec2023-03-09T15:05:08ZengPeerJ Inc.PeerJ Computer Science2376-59922023-03-019e124110.7717/peerj-cs.1241Symbolic expression generation via variational auto-encoderSergei Popov0Mikhail Lazarev1Vladislav Belavin2Denis Derkach3Andrey Ustyuzhanin4Department of Computer Science, Higher School of Economics, Moscow, RussiaDepartment of Computer Science, Higher School of Economics, Moscow, RussiaDepartment of Computer Science, Higher School of Economics, Moscow, RussiaDepartment of Computer Science, Higher School of Economics, Moscow, RussiaDepartment of Computer Science, Higher School of Economics, Moscow, RussiaThere are many problems in physics, biology, and other natural sciences in which symbolic regression can provide valuable insights and discover new laws of nature. Widespread deep neural networks do not provide interpretable solutions. Meanwhile, symbolic expressions give us a clear relation between observations and the target variable. However, at the moment, there is no dominant solution for the symbolic regression task, and we aim to reduce this gap with our algorithm. In this work, we propose a novel deep learning framework for symbolic expression generation via variational autoencoder (VAE). We suggest using a VAE to generate mathematical expressions, and our training strategy forces generated formulas to fit a given dataset. Our framework allows encoding apriori knowledge of the formulas into fast-check predicates that speed up the optimization process. We compare our method to modern symbolic regression benchmarks and show that our method outperforms the competitors under noisy conditions. The recovery rate of SEGVAE is 65% on the Ngyuen dataset with a noise level of 10%, which is better than the previously reported SOTA by 20%. We demonstrate that this value depends on the dataset and can be even higher.https://peerj.com/articles/cs-1241.pdfSymbolic regressionVAELSTMConstrained optimizationGenerationMachine learning
spellingShingle	Sergei Popov Mikhail Lazarev Vladislav Belavin Denis Derkach Andrey Ustyuzhanin Symbolic expression generation via variational auto-encoder PeerJ Computer Science Symbolic regression VAE LSTM Constrained optimization Generation Machine learning
title	Symbolic expression generation via variational auto-encoder
title_full	Symbolic expression generation via variational auto-encoder
title_fullStr	Symbolic expression generation via variational auto-encoder
title_full_unstemmed	Symbolic expression generation via variational auto-encoder
title_short	Symbolic expression generation via variational auto-encoder
title_sort	symbolic expression generation via variational auto encoder
topic	Symbolic regression VAE LSTM Constrained optimization Generation Machine learning
url	https://peerj.com/articles/cs-1241.pdf
work_keys_str_mv	AT sergeipopov symbolicexpressiongenerationviavariationalautoencoder AT mikhaillazarev symbolicexpressiongenerationviavariationalautoencoder AT vladislavbelavin symbolicexpressiongenerationviavariationalautoencoder AT denisderkach symbolicexpressiongenerationviavariationalautoencoder AT andreyustyuzhanin symbolicexpressiongenerationviavariationalautoencoder

Symbolic expression generation via variational auto-encoder

Similar Items