Improving Generalization of Genetic Programming for High-Dimensional Symbolic Regression with Shapley Value Based Feature SelectionImproving Generalization of Genetic Programming...C. Wang et al.

被引:0
|
作者
Chunyu Wang [1 ]
Qi Chen [1 ]
Bing Xue [1 ]
Mengjie Zhang [1 ]
机构
[1] Victoria University of Wellington,Centre for Data Science and Artificial Intelligence & School of Engineering and Computer Science
关键词
Feature selection; Generalization; Genetic programming; Symbolic regression;
D O I
10.1007/s41019-024-00270-x
中图分类号
学科分类号
摘要
Symbolic Regression (SR) on high-dimensional datasets often encounters significant challenges, resulting in models with poor generalization capabilities. While feature selection has the potential to enhance the generalization and learning performance in general, its application in Genetic Programming (GP) for high-dimensional SR remains a complex problem. Originating from game theory, the Shapley value is applied to additive feature attribution approaches where it distributes the difference between a model output and a baseline average across input variables. By providing an accurate assessment of each feature importance, the Shapley value offers a robust approach to select features. In this paper, we propose a novel feature selection method leveraging the Shapley value to identify and select important features in GP for high-dimensional SR. Through a series of experiments conducted on ten high-dimensional regression datasets, the results indicate that our algorithm surpasses standard GP and other GP-based feature selection methods in terms of learning and generalization performance on most datasets. Further analysis reveals that our algorithm generates more compact models, focusing on the inclusion of important features.
引用
收藏
页码:196 / 211
页数:15
相关论文
共 5 条
  • [1] Feature Selection to Improve Generalization of Genetic Programming for High-Dimensional Symbolic Regression
    Chen, Qi
    Zhang, Mengjie
    Xue, Bing
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2017, 21 (05) : 792 - 806
  • [2] Genetic Programming for Feature Selection Based on Feature Removal Impact in High-Dimensional Symbolic Regression
    Al-Helali, Baligh
    Chen, Qi
    Xue, Bing
    Zhang, Mengjie
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, 8 (03): : 2269 - 2282
  • [3] Genetic Programming with Embedded Feature Construction for High-Dimensional Symbolic Regression
    Chen, Qi
    Zhang, Mengjie
    Xue, Bing
    INTELLIGENT AND EVOLUTIONARY SYSTEMS, IES 2016, 2017, 8 : 87 - 102
  • [4] Improving Generalization of Genetic Programming for Symbolic Regression With Angle-Driven Geometric Semantic Operators
    Chen, Qi
    Xue, Bing
    Zhang, Mengjie
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2019, 23 (03) : 488 - 502
  • [5] Genetic Programming for Imputation Predictor Selection and Ranking in Symbolic Regression with High-Dimensional Incomplete Data
    Al-Helali, Baligh
    Chen, Qi
    Xue, Bing
    Zhang, Mengjie
    AI 2019: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11919 : 523 - 535