Surrogate-Assisted Genetic Algorithm for Wrapper Feature Selection

被引:13
作者
Altarabichi, Mohammed Ghaith [1 ]
Nowaczyk, Slawomir [1 ]
Pashami, Sepideh [1 ]
Mashhadi, Peyman Sheikholharam [1 ]
机构
[1] Halmstad Univ, Ctr Appl Intelligent Syst Res, Halmstad, Sweden
来源
2021 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC 2021) | 2021年
关键词
Feature selection; Wrapper; Genetic Algorithm; Progressive Sampling; Surrogates; Meta-models; Evolution Control; Optimization;
D O I
10.1109/CEC45853.2021.9504718
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is an intractable problem, therefore practical algorithms often trade off the solution accuracy against the computation time. In this paper, we propose a novel multi-stage feature selection framework utilizing multiple levels of approximations, or surrogates. Such a framework allows for using wrapper approaches in a much more computationally efficient way, significantly increasing the quality of feature selection solutions achievable, especially on large datasets. We design and evaluate a Surrogate-Assisted Genetic Algorithm (SAGA) which utilizes this concept to guide the evolutionary search during the early phase of exploration. SAGA only switches to evaluating the original function at the final exploitation phase. We prove that the run-time upper bound of SAGA surrogate-assisted stage is at worse equal to the wrapper GA, and it scales better for induction algorithms of high order of complexity in number of instances. We demonstrate, using 14 datasets from the UCI ML repository, that in practice SAGA significantly reduces the computation time compared to a baseline wrapper Genetic Algorithm (GA), while converging to solutions of significantly higher accuracy. Our experiments show that SAGA can arrive at near-optimal solutions three times faster than a wrapper GA, on average. We also showcase the importance of evolution control approach designed to prevent surrogates from misleading the evolutionary search towards false optima.
引用
收藏
页码:776 / 785
页数:10
相关论文
共 37 条
[1]  
Aha D.W., 1996, LEARNING DATA ARTIFI, P199, DOI DOI 10.1007/978-1-4612-2404-4_19
[2]   On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems [J].
Amaldi, E ;
Kann, V .
THEORETICAL COMPUTER SCIENCE, 1998, 209 (1-2) :237-260
[3]  
[Anonymous], 2011, Acm T. Intel. Syst. Tec., DOI DOI 10.1145/1961189.1961199
[4]   Evolutionary rough feature selection in gene expression data [J].
Banerjee, Mohua ;
Mitra, Sushmita ;
Banka, Haider .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2007, 37 (04) :622-632
[5]   FAST GENETIC SELECTION OF FEATURES FOR NEURAL NETWORK CLASSIFIERS [J].
BRILL, FZ ;
BROWN, DE ;
MARTIN, WN .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1992, 3 (02) :324-328
[6]   On Model-Based Evolutionary Computation [J].
L. Bull .
Soft Computing, 1999, 3 (2) :76-82
[7]   Feature selection in machine learning: A new perspective [J].
Cai, Jie ;
Luo, Jiawei ;
Wang, Shulin ;
Yang, Sheng .
NEUROCOMPUTING, 2018, 300 :70-79
[8]  
Chakraborty B, 2002, ISIE 2002: PROCEEDINGS OF THE 2002 IEEE INTERNATIONAL SYMPOSIUM ON INDUSTRIAL ELECTRONICS, VOLS 1-4, P315, DOI 10.1109/ISIE.2002.1026085
[9]  
Eshelman L.J., 1991, FDN GENETIC ALGORITH, V1, P265, DOI DOI 10.1016/B978-0-08-050684-5.50020-3
[10]   A novel filter-wrapper hybrid greedy ensemble approach optimized using the genetic algorithm to reduce the dimensionality of high-dimensional biomedical datasets [J].
Gangavarapu, Tushaar ;
Patil, Nagamma .
APPLIED SOFT COMPUTING, 2019, 81