Choosing function sets with better generalisation performance for symbolic regression models

被引:19
作者
Nicolau, Miguel [1 ]
Agapitos, Alexandros [2 ]
机构
[1] Univ Coll Dublin, Coll Business, Dublin, Ireland
[2] Huawei Technol Ltd, Ireland Res Ctr, Dublin, Ireland
关键词
Symbolic regression; Genetic Programming; Machine learning; Generalisation; Overfitting; Data-driven modelling; REGULARIZATION APPROACH; BLOAT CONTROL; PREDICTION; ENSEMBLE; RISK; GP;
D O I
10.1007/s10710-020-09391-4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Supervised learning by means of Genetic Programming (GP) aims at the evolutionary synthesis of a model that achieves a balance between approximating the target function on the training data and generalising on new data. The model space searched by the Evolutionary Algorithm is populated by compositions of primitive functions defined in a function set. Since the target function is unknown, the choice of function set's constituent elements is primarily guided by the makeup of function sets traditionally used in the GP literature. Our work builds upon previous research of the effects of protected arithmetic operators (i.e. division, logarithm, power) on the output value of an evolved model for input data points not encountered during training. The scope is to benchmark the approximation/generalisation of models evolved using different function set choices across a range of 43 symbolic regression problems. The salient outcomes are as follows. Firstly, Koza's protected operators of division and exponentiation have a detrimental effect on generalisation, and should therefore be avoided. This result is invariant of the use of moderately sized validation sets for model selection. Secondly, the performance of the recently introduced analytic quotient operator is comparable to that of the sinusoidal operator on average, with their combination being advantageous to both approximation and generalisation. These findings are consistent across two different system implementations, those of standard expression-tree GP and linear Grammatical Evolution. We highlight that this study employed very large test sets, which create confidence when benchmarking the effect of different combinations of primitive functions on model generalisation. Our aim is to encourage GP researchers and practitioners to use similar stringent means of assessing generalisation of evolved models where possible, and also to avoid certain primitive functions that are known to be inappropriate.
引用
收藏
页码:73 / 100
页数:28
相关论文
共 78 条
  • [1] A Survey of Statistical Machine Learning Elements in Genetic Programming
    Agapitos, Alexandros
    Loughran, Roisin
    Nicolau, Miguel
    Lucas, Simon
    O'Neill, Michael
    Brabazon, Anthony
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2019, 23 (06) : 1029 - 1048
  • [2] [Anonymous], IEEE C EV COMP CEC 2
  • [3] [Anonymous], 2002, Foundations of genetic programming
  • [4] A decomposition method for symbolic regression problems
    Astarabadi, Samaneh Sadat Mousavi
    Ebadzadeh, Mohammad Mehdi
    [J]. APPLIED SOFT COMPUTING, 2018, 62 : 514 - 523
  • [5] Azad RMA, 2011, GECCO-2011: PROCEEDINGS OF THE 13TH ANNUAL GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, P1315
  • [6] Comparison of multi-linear regression, particle swarm optimization artificial neural networks and genetic programming in the development of mini-tablets
    Barmpalexis, Panagiotis
    Karagianni, Anna
    Karasavvaides, Grigorios
    Kachrimanis, Kyriakos
    [J]. INTERNATIONAL JOURNAL OF PHARMACEUTICS, 2018, 551 (1-2) : 166 - 176
  • [7] Castelli M, 2010, IEEE C EVOL COMPUTAT
  • [8] Structural Risk Minimization-Driven Genetic Programming for Enhancing Generalization in Symbolic Regression
    Chen, Qi
    Zhang, Mengjie
    Xue, Bing
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2019, 23 (04) : 703 - 717
  • [9] Improving Generalization of Genetic Programming for Symbolic Regression With Angle-Driven Geometric Semantic Operators
    Chen, Qi
    Xue, Bing
    Zhang, Mengjie
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2019, 23 (03) : 488 - 502
  • [10] Feature Selection to Improve Generalization of Genetic Programming for High-Dimensional Symbolic Regression
    Chen, Qi
    Zhang, Mengjie
    Xue, Bing
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2017, 21 (05) : 792 - 806