Genetic Programming for Feature Selection Based on Feature Removal Impact in High-Dimensional Symbolic Regression

被引:5
|
作者
Al-Helali, Baligh [1 ,2 ]
Chen, Qi [1 ,2 ]
Xue, Bing [1 ,2 ]
Zhang, Mengjie [1 ,2 ]
机构
[1] Victoria Univ Wellington, Ctr Data Sci & Artificial Intelligence, Wellington 6140, New Zealand
[2] Victoria Univ Wellington, Sch Engn & Comp Sci, Wellington 6140, New Zealand
来源
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE | 2024年 / 8卷 / 03期
关键词
Feature selection; genetic programming; high dimensionality; symbolic regression; FEATURE RANKING; CLASSIFICATION; EVOLUTIONARY;
D O I
10.1109/TETCI.2024.3369407
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Symbolic regression is increasingly important for discovering mathematical models for various prediction tasks. It works by searching for the arithmetic expressions that best represent a target variable using a set of input features. However, as the number of features increases, the search process becomes more complex. To address high-dimensional symbolic regression, this work proposes a genetic programming for feature selection method based on the impact of feature removal on the performance of SR models. Unlike existing Shapely value methods that simulate feature absence at the data level, the proposed approach suggests removing features at the model level. This approach circumvents the production of unrealistic data instances, which is a major limitation of Shapely value and permutation-based methods. Moreover, after calculating the importance of the features, a cut-off strategy, which works by injecting a number of random features and utilising their importance to automatically set a threshold, is proposed for selecting important features. The experimental results on artificial and real-world high-dimensional data sets show that, compared with state-of-the-art feature selection methods using the permutation importance and Shapely value, the proposed method not only improves the SR accuracy but also selects smaller sets of features.
引用
收藏
页码:2269 / 2282
页数:14
相关论文
共 50 条
  • [31] High-Dimensional Software Engineering Data and Feature Selection
    Wang, Huanjing
    Khoshgoftaar, Taghi M.
    Gao, Kehan
    Seliya, Naeem
    ICTAI: 2009 21ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE, 2009, : 83 - +
  • [32] Improved PSO for feature selection on high-dimensional datasets
    Tran, Binh (binh.tran@ecs.vuw.ac.nz), 1600, Springer Verlag (8886): : 503 - 515
  • [33] Feature selection for high-dimensional classification using a competitive swarm optimizer
    Shenkai Gu
    Ran Cheng
    Yaochu Jin
    Soft Computing, 2018, 22 : 811 - 822
  • [34] Feature selection for high-dimensional classification using a competitive swarm optimizer
    Gu, Shenkai
    Cheng, Ran
    Jin, Yaochu
    SOFT COMPUTING, 2018, 22 (03) : 811 - 822
  • [35] Improving Evolutionary Algorithm Performance for Feature Selection in High-Dimensional Data
    Cilia, N.
    De Stefano, C.
    Fontanella, F.
    di Freca, A. Scotto
    APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2018, 2018, 10784 : 439 - 454
  • [36] Genetic programming with a genetic algorithm for feature construction and selection
    Smith M.G.
    Bull L.
    Genetic Programming and Evolvable Machines, 2005, 6 (3) : 265 - 281
  • [37] A feature-thresholds guided genetic algorithm based on a multi-objective feature scoring method for high-dimensional feature selection
    Deng, Shaobo
    Li, Yulong
    Wang, Junke
    Cao, Rutun
    Li, Min
    APPLIED SOFT COMPUTING, 2023, 148
  • [38] Using Feature Clustering for GP-Based Feature Construction on High-Dimensional Data
    Binh Tran
    Xue, Bing
    Zhang, Mengjie
    GENETIC PROGRAMMING, EUROGP 2017, 2017, 10196 : 210 - 226
  • [39] Feature selection for high-dimensional regression via sparse LSSVR based on Lp-norm
    Li, Chun-Na
    Shao, Yuan-Hai
    Zhao, Da
    Guo, Yan-Ru
    Hua, Xiang-Yu
    INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (02) : 1108 - 1130
  • [40] Maximal cliques-based hybrid high-dimensional feature selection with interaction screening for regression
    Chamlal, Hasna
    Benzmane, Asmaa
    Ouaderhman, Tayeb
    NEUROCOMPUTING, 2024, 607