A transparent and nonlinear method for variable selection

被引:3
作者
Wang, Keyao [1 ,2 ]
Wang, Huiwen [1 ,3 ]
Zhao, Jichang [1 ]
Wang, Lihong [4 ]
机构
[1] Beihang Univ, Sch Econ & Management, Beijing, Peoples R China
[2] Beijing Key Lab Emergency Support Simulat Technol, Beijing, Peoples R China
[3] Beihang Univ, Key Lab Complex Syst Anal Management & Decis, Minist Educ, Beijing, Peoples R China
[4] Coordinat Ctr China, Natl Comp Network Emergency Response Tech Team, Beijing, Peoples R China
基金
中国国家自然科学基金;
关键词
Variable selection; High-dimensional; Interpretation; Nonlinear relevance; GRAM-SCHMIDT ORTHOGONALIZATION; REGRESSION; LIKELIHOOD;
D O I
10.1016/j.eswa.2023.121398
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Variable selection is a procedure to obtain truly important predictors from inputs. Complex nonlinear dependencies and strong coupling pose great challenges for variable selection in high-dimensional data. Real world applications have increased the demand for interpretable selection processes. A pragmatic approach should not only yield the most predictive covariates but also provide ample and easy-to-understand reasons for removing certain covariates. In view of these requirements, this paper proposes an approach for transparent and nonlinear variable selection. To transparently decouple information within the input predictors, a three step heuristic search is designed, by which the input predictors are grouped into four subsets: relevant predictors, which are selected, and uninformative, redundant, and conditionally independent predictors, which are removed. A nonlinear partial correlation coefficient is introduced to better identify the predictors that have nonlinear functional dependence with the response. The selected subset is competent input for commonly used predictive models. Superiority of the proposed method is demonstrated against state-of-the-art baselines in terms of predictive accuracy and model interpretability.
引用
收藏
页数:13
相关论文
共 53 条
[1]   Support Vector Machine with feature selection: A multiobjective approach [J].
Alcaraz, Javier ;
Labbe, Martine ;
Landete, Mercedes .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 204
[2]   Variable selection in regression-a tutorial [J].
Andersen, C. M. ;
Bro, R. .
JOURNAL OF CHEMOMETRICS, 2010, 24 (11-12) :728-737
[3]   Variable Selection for Clustering and Classification [J].
Andrews, Jeffrey L. ;
McNicholas, Paul D. .
JOURNAL OF CLASSIFICATION, 2014, 31 (02) :136-153
[4]   A SIMPLE MEASURE OF CONDITIONAL DEPENDENCE [J].
Azadkia, Mona ;
Chatterjee, Sourav .
ANNALS OF STATISTICS, 2021, 49 (06) :3070-3102
[5]   Conditional Sure Independence Screening [J].
Barut, Emre ;
Fan, Jianqing ;
Verhasselt, Anneleen .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2016, 111 (515) :1266-1277
[6]  
Brown G, 2012, J MACH LEARN RES, V13, P27
[7]   Variable selection in high-dimensional linear models: partially faithful distributions and the PC-simple algorithm [J].
Buehlmann, P. ;
Kalisch, M. ;
Maathuis, M. H. .
BIOMETRIKA, 2010, 97 (02) :261-278
[8]  
Buitinck L., 2013, ECML PKDD WORKSHOP L
[9]   Feature selection in machine learning: A new perspective [J].
Cai, Jie ;
Luo, Jiawei ;
Wang, Shulin ;
Yang, Sheng .
NEUROCOMPUTING, 2018, 300 :70-79
[10]  
Cateni S., 2021, International Journal of Simulation: Systems, Science & Technology