A GRASP algorithm for fast hybrid (filter-wrapper) feature subset selection in high-dimensional datasets

被引:86
作者
Bermejo, Pablo [1 ]
Gamez, Jose A. [1 ]
Puerta, Jose M. [1 ]
机构
[1] Univ Castilla La Mancha, Intelligent Syst & Data Min Lab, Comp Syst Dept, Albacete 02071, Spain
关键词
Feature selection; Classification; GRASP; Filter; Wrapper; High-dimensional datasets; SEARCH METHODS;
D O I
10.1016/j.patrec.2010.12.016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature subset selection is a key problem in the data-mining classification task that helps to obtain more compact and understandable models without degrading (or even improving) their performance. In this work we focus on FSS in high-dimensional datasets, that is, with a very large number of predictive attributes. In this case, standard sophisticated wrapper algorithms cannot be applied because of their complexity, and computationally lighter filter-wrapper algorithms have recently been proposed. In this work we propose a stochastic algorithm based on the GRASP meta-heuristic, with the main goal of speeding up the feature subset selection process, basically by reducing the number of wrapper evaluations to carry out. GRASP is a multi-start constructive method which constructs a solution in its first stage, and then runs an improving stage over that solution. Several instances of the proposed GRASP method are experimentally tested and compared with state-of-the-art algorithms over 12 high-dimensional datasets. The statistical analysis of the results shows that our proposal is comparable in accuracy and cardinality of the selected subset to previous algorithms, but requires significantly fewer evaluations. (C) 2010 Elsevier BM. All rights reserved.
引用
收藏
页码:701 / 711
页数:11
相关论文
共 37 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]  
[Anonymous], 1998, Ph.D. Thesis
[3]  
[Anonymous], 1973, Pattern Classification and Scene Analysis
[4]  
Bermejo J.P.P., 2008, INCREMENTAL WRAPPER
[5]   Incremental Wrapper-based Subset Selection with Replacement: an advantageous alternative to sequential forward selection [J].
Bermejo, Pablo ;
Gamez, Jose A. ;
Puerta, Jose M. .
2009 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, 2009, :367-374
[6]  
Bishop CM., 1995, NEURAL NETWORKS PATT
[7]  
BLANCO R, 2001, P WORKSH BAY MOD MED, P29
[8]  
Boser B. E., 1992, Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory, P144, DOI 10.1145/130385.130401
[9]  
Cano JR, 2002, LECT NOTES ARTIF INT, V2527, P214
[10]  
CASADOYUSTA S, 2009, PATTERN RECOGN, V30, P525