Effects of reducing redundant parameters in parameter optimization for symbolic regression using genetic programming

被引:0
作者
Kronberger, Gabriel [1 ]
de Franca, Fabricio Olivetti [2 ]
机构
[1] Univ Appl Sci Upper Austria, Sch Informat Commun & Media, Heurist & Evolutionary Algorithms Lab HEAL, Software Pk 11, A-4232 Hagenberg, Austria
[2] Univ Fed ABC, Ctr Math Comp & Cognit, Heurist Anal & Learning Lab HAL, Santo Andre, Brazil
基金
巴西圣保罗研究基金会;
关键词
Symbolic regression; Machine learning; Expression rewriting; Equality saturation; Nonlinear least squares;
D O I
10.1016/j.jsc.2024.102413
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Gradient-based local optimization has been shown to improve results of genetic programming (GP) for symbolic regression (SR) - a machine learning method for symbolic equation learning. Correspondingly, several state-of-the-art GP implementations use iterative nonlinear least squares (NLS) algorithms for local optimization of parameters. An issue that has however mostly been ignored in SR and GP literature is overparameterization of SR expressions, and as a consequence, bad conditioning of NLS optimization problem. The aim of this study is to analyze the effects of overparameterization on the NLS results and convergence speed, whereby we use Operon as an example GP/SR implementation. First, we demonstrate that numeric rank approximation can be used to detect overparameterization using a set of six selected benchmark problems. In the second part, we analyze whether the NLS results or convergence speed can be improved by simplifying expressions to remove redundant parameters with equality saturation. This analysis is done with the much larger Feynman symbolic regression benchmark set after collecting all expressions visited by GP, as the simplification procedure is not fast enough to use it within GP fitness evaluation. We observe that Operon frequently visits over- parameterized solutions but the number of redundant parameters is small on average. We analyzed the Pareto-optimal expressions of the first and last generation of GP, and found that for 70% to 80% of the simplified expressions, the success rate of reaching the optimum was better or equal than for the overparameterized form.
引用
收藏
页数:16
相关论文
共 41 条
[1]   Interaction-Transformation Evolutionary Algorithm with coefficients optimization [J].
Aldeia, Guilherme Seidyo Imai ;
de Franca, Fabricio Olivetti .
PROCEEDINGS OF THE 2022 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2022, 2022, :2274-2281
[2]  
[Anonymous], 2002, Foundations of Genetic Programming, DOI [10.1007/978-3-662-04726-2, DOI 10.1007/978-3-662-04726-2]
[3]  
[Anonymous], MATRIX COMPUTATIONS
[4]  
BELSLEY DA, 1980, REGRESSION DIAGNOSTI
[5]   MAIN ROTOR BROAD-BAND NOISE STUDY IN THE DNW [J].
BROOKS, TF ;
MARCOLINI, MA ;
POPE, DS .
JOURNAL OF THE AMERICAN HELICOPTER SOCIETY, 1989, 34 (02) :3-12
[6]  
Burlacu Bogdan, 2020, GECCO'20. Proceedings of the 2020 Genetic and Evolutionary Computation Conference Companion, P1562, DOI 10.1145/3377929.3398099
[7]   GECCO'2022 Symbolic Regression Competition: Post-analysis of the Operon Framework [J].
Burlacu, Bogdan .
PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2023 COMPANION, 2023, :2412-2419
[8]   Reducing Overparameterization of Symbolic Regression Models with Equality Saturation [J].
de Franca, Fabricio Olivetti ;
Kronberger, Gabriel .
PROCEEDINGS OF THE 2023 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, GECCO 2023, 2023, :1064-1072
[9]   Feature Standardisation and Coefficient Optimisation for Effective Symbolic Regression [J].
Dick, Grant ;
Owen, Caitlin A. ;
Whigham, Peter A. .
GECCO'20: PROCEEDINGS OF THE 2020 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2020, :306-314
[10]  
Guennebaud Gael, 2010, Eigen v3