Modular Multitree Genetic Programming for Evolutionary Feature Construction for Regression

被引:2
作者
Zhang, Hengzhe [1 ,2 ]
Chen, Qi [1 ,2 ]
Xue, Bing [1 ,2 ]
Banzhaf, Wolfgang [3 ]
Zhang, Mengjie [1 ,2 ]
机构
[1] Victoria Univ Wellington, Ctr Data Sci & Artificial Intelligence, Wellington 6140, New Zealand
[2] Victoria Univ Wellington, Sch Engn & Comp Sci, Wellington 6140, New Zealand
[3] Michigan State Univ, Dept Comp Sci & Engn, E Lansing 48824, MI USA
关键词
Genetic programming; Task analysis; Semantics; Random forests; Machine learning algorithms; Computational modeling; Contracts; Evolutionary feature construction; evolutionary forest; genetic programming (GP); modularity; random forest; SELECTION; OPERATOR;
D O I
10.1109/TEVC.2023.3318638
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Evolutionary feature construction is a key technique in evolutionary machine learning, with the aim of constructing high-level features that enhance performance of a learning algorithm. In real-world applications, engineers typically construct complex features based on a combination of basic features, reusing those features as modules. However, modularity in evolutionary feature construction is still an open research topic. This article tries to fill that gap by proposing a modular and hierarchical multitree genetic programming (GP) algorithm that allows trees to use the output values of other trees, thereby representing expressive features in a compact form. Based on this new representation, we propose a macro parent-repair strategy to reduce redundant and irrelevant features, a macro crossover operator to preserve interactive features, and an adaptive control strategy for crossover and mutation rates to dynamically balance the tradeoff between exploration and exploitation. A comparison with seven bloat control methods on 98 regression datasets shows that the proposed modular representation achieves significantly better results in terms of test performance and smaller model size. Experimental results on the state-of-the-art acrlong SRBench demonstrate that the proposed symbolic regression method outperforms 22 existing symbolic regression and machine learning algorithms, providing empirical evidence for the superiority of the modularized evolutionary feature construction method.
引用
收藏
页码:1455 / 1469
页数:15
相关论文
共 58 条
  • [1] Alfaro-Cid Eva, 2008, 2008 8th International Conference on Hybrid Intelligent Systems (HIS), P31, DOI 10.1109/HIS.2008.127
  • [2] Bloat Control Operators and Diversity in Genetic Programming: A Comparative Study
    Alfaro-Cid, E.
    Merelo, J. J.
    Fernandez de Vega, F.
    Esparcia-Alcazar, A. I.
    Sharman, K.
    [J]. EVOLUTIONARY COMPUTATION, 2010, 18 (02) : 305 - 332
  • [3] [Anonymous], 2002, P GECCO
  • [4] [Anonymous], 2016, Genetic Programming Theory and Practice XIII, DOI DOI 10.1007/978-3-319-34223-81
  • [5] Banzhaf W, 1998, Genetic programming: An introduction: On the automatic evolution of computer programs and its applications
  • [6] Brameier M.F., 2007, Linear Genetic Programming, DOI DOI 10.1007/978-0-387-31030-5
  • [7] Cava W. L., 2021, P 35 C NEUR INF PROC, P1
  • [8] Feature Selection to Improve Generalization of Genetic Programming for High-Dimensional Symbolic Regression
    Chen, Qi
    Zhang, Mengjie
    Xue, Bing
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2017, 21 (05) : 792 - 806
  • [9] Cleveland W. S., 1993, VISUALIZING DATA
  • [10] Multi-objective methods for tree size control
    Edwin D. de Jong
    Jordan B. Pollack
    [J]. Genetic Programming and Evolvable Machines, 2003, 4 (3) : 211 - 233