Genetic Programming for Feature Selection Based on Feature Removal Impact in High-Dimensional Symbolic Regression

被引:5
|
作者
Al-Helali, Baligh [1 ,2 ]
Chen, Qi [1 ,2 ]
Xue, Bing [1 ,2 ]
Zhang, Mengjie [1 ,2 ]
机构
[1] Victoria Univ Wellington, Ctr Data Sci & Artificial Intelligence, Wellington 6140, New Zealand
[2] Victoria Univ Wellington, Sch Engn & Comp Sci, Wellington 6140, New Zealand
来源
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2024年 / 8卷 / 03期
关键词
Feature selection; genetic programming; high dimensionality; symbolic regression; FEATURE RANKING; CLASSIFICATION; EVOLUTIONARY;
D O I
10.1109/TETCI.2024.3369407
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Symbolic regression is increasingly important for discovering mathematical models for various prediction tasks. It works by searching for the arithmetic expressions that best represent a target variable using a set of input features. However, as the number of features increases, the search process becomes more complex. To address high-dimensional symbolic regression, this work proposes a genetic programming for feature selection method based on the impact of feature removal on the performance of SR models. Unlike existing Shapely value methods that simulate feature absence at the data level, the proposed approach suggests removing features at the model level. This approach circumvents the production of unrealistic data instances, which is a major limitation of Shapely value and permutation-based methods. Moreover, after calculating the importance of the features, a cut-off strategy, which works by injecting a number of random features and utilising their importance to automatically set a threshold, is proposed for selecting important features. The experimental results on artificial and real-world high-dimensional data sets show that, compared with state-of-the-art feature selection methods using the permutation importance and Shapely value, the proposed method not only improves the SR accuracy but also selects smaller sets of features.
引用
收藏
页码:2269 / 2282
页数:14
相关论文
共 50 条
  • [1] Feature Selection to Improve Generalization of Genetic Programming for High-Dimensional Symbolic Regression
    Chen, Qi
    Zhang, Mengjie
    Xue, Bing
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2017, 21 (05) : 792 - 806
  • [2] Genetic Programming with Embedded Feature Construction for High-Dimensional Symbolic Regression
    Chen, Qi
    Zhang, Mengjie
    Xue, Bing
    INTELLIGENT AND EVOLUTIONARY SYSTEMS, IES 2016, 2017, 8 : 87 - 102
  • [3] Genetic Programming for Feature Selection and Construction to High-Dimensional Data
    Ma, Jianbin
    Zhu, Man
    2024 4TH INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND INTELLIGENT SYSTEMS ENGINEERING, MLISE 2024, 2024, : 196 - 200
  • [4] Genetic programming for feature construction and selection in classification on high-dimensional data
    Binh Tran
    Bing Xue
    Mengjie Zhang
    Memetic Computing, 2016, 8 : 3 - 15
  • [5] Genetic programming for feature construction and selection in classification on high-dimensional data
    Binh Tran
    Xue, Bing
    Zhang, Mengjie
    MEMETIC COMPUTING, 2016, 8 (01) : 3 - 15
  • [6] A new representation in genetic programming with hybrid feature ranking criterion for high-dimensional feature selection
    Li, Jiayi
    Zhang, Fan
    Ma, Jianbin
    COMPLEX & INTELLIGENT SYSTEMS, 2025, 11 (04)
  • [7] Genetic Programming for Imputation Predictor Selection and Ranking in Symbolic Regression with High-Dimensional Incomplete Data
    Al-Helali, Baligh
    Chen, Qi
    Xue, Bing
    Zhang, Mengjie
    AI 2019: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, 11919 : 523 - 535
  • [8] Improving Generalization of Genetic Programming for High-Dimensional Symbolic Regression with Shapley Value Based Feature SelectionImproving Generalization of Genetic Programming...C. Wang et al.
    Chunyu Wang
    Qi Chen
    Bing Xue
    Mengjie Zhang
    Data Science and Engineering, 2025, 10 (2) : 196 - 211
  • [9] Multi Hive Artificial Bee Colony Programming for high dimensional symbolic regression with feature selection
    Arslan, Sibel
    Ozturk, Celal
    APPLIED SOFT COMPUTING, 2019, 78 : 515 - 527
  • [10] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17