Maximal cliques-based hybrid high-dimensional feature selection with interaction screening for regression

被引:1
作者
Chamlal, Hasna [1 ]
Benzmane, Asmaa [1 ]
Ouaderhman, Tayeb [1 ]
机构
[1] Hassan II Univ Casablanca, Fac Sci Ain Chock, Dept Math & Informat, Casablanca, Morocco
关键词
Feature selection; Maximal clique; Rank correlation; High-dimensional data; Regression; GENERALIZED LINEAR-MODELS; GENETIC ALGORITHM; KOLMOGOROV FILTER; MARKOV BLANKET; BREAST-CANCER; EXPRESSION; REGULARIZATION; SEARCH;
D O I
10.1016/j.neucom.2024.128361
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Studies on feature selection have been extensively conducted in the literature, as it plays a significant role in both supervised and unsupervised machine learning tasks. Since the bulk of features in high-dimensional data sets might not be significant, feature selection plays a key role in removing unimportant variables and improving prediction and data analysis performance. Many of the current feature selection methods, meanwhile, become ineffective when used on contemporary datasets, which exhibit an escalating number of features in relation to sample size. This paper introduces a novel supervised feature selection method for regression problems. The proposed algorithm is called maximal Clique with Interaction Screening (ISClique). The ISClique algorithm's overall structure can be described in two steps. Initially, a filter approach is used to select relevant features from an initial feature space and examine the different interactions between them. This is done using an innovative coefficient based on Kendall's tau and partial Kendall's tau. Secondly, the maximal clique strategy is applied as a wrapper to the selected set from the previous step to construct subsets of features. The most optimal subset that minimizes prediction error is selected. The proposed method integrates the advantages of graph theory with feature screening. Additionally, because the criteria employed in developing the ISClique method accommodate variable heterogeneity, this method is equally suitable for classification tasks. The proposed hybrid approach has been evaluated through applications involving various simulation scenarios and real datasets. Experimental findings demonstrate the advantages of ISClique over comparable methods.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Interaction screening for high-dimensional heterogeneous data via robust hybrid metrics
    Xiong, Wei
    Pan, Han
    STATISTICS IN MEDICINE, 2021, 40 (29) : 6651 - 6673
  • [42] Feature Selection for High-Dimensional Data: The Issue of Stability
    Pes, Barbara
    2017 IEEE 26TH INTERNATIONAL CONFERENCE ON ENABLING TECHNOLOGIES - INFRASTRUCTURE FOR COLLABORATIVE ENTERPRISES (WETICE), 2017, : 170 - 175
  • [43] Cluster feature selection in high-dimensional linear models
    Lin, Bingqing
    Pang, Zhen
    Wang, Qihua
    RANDOM MATRICES-THEORY AND APPLICATIONS, 2018, 7 (01)
  • [44] Improved PSO for Feature Selection on High-Dimensional Datasets
    Tran, Binh
    Xue, Bing
    Zhang, Mengjie
    SIMULATED EVOLUTION AND LEARNING (SEAL 2014), 2014, 8886 : 503 - 515
  • [45] Clustering high-dimensional data via feature selection
    Liu, Tianqi
    Lu, Yu
    Zhu, Biqing
    Zhao, Hongyu
    BIOMETRICS, 2023, 79 (02) : 940 - 950
  • [46] Boosting the Convergence of a GA-based Wrapper for Feature Selection Problems on High-dimensional Data
    Carlos Gomez-Lopez, Juan
    Jose Escobar, Juan
    Francisco Diaz, Antonio
    Damas, Miguel
    Gil-Montoya, Francisco
    Gonzalez, Jesus
    PROCEEDINGS OF THE 2022 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE COMPANION, GECCO 2022, 2022, : 431 - 434
  • [47] Efficient feature selection filters for high-dimensional data
    Ferreira, Artur J.
    Figueiredo, Mario A. T.
    PATTERN RECOGNITION LETTERS, 2012, 33 (13) : 1794 - 1804
  • [48] Improved PSO for feature selection on high-dimensional datasets
    Tran, Binh (binh.tran@ecs.vuw.ac.nz), 1600, Springer Verlag (8886): : 503 - 515
  • [49] A Survey of Tuning Parameter Selection for High-Dimensional Regression
    Wu, Yunan
    Wang, Lan
    ANNUAL REVIEW OF STATISTICS AND ITS APPLICATION, VOL 7, 2020, 2020, 7 : 209 - 226
  • [50] Feature Selection to Improve Generalization of Genetic Programming for High-Dimensional Symbolic Regression
    Chen, Qi
    Zhang, Mengjie
    Xue, Bing
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2017, 21 (05) : 792 - 806