Fast wrapper feature subset selection in high-dimensional datasets by means of filter re-ranking

被引:111
作者
Bermejo, Pablo [1 ]
de la Ossa, Luis [1 ]
Gamez, Jose A. [1 ]
Puerta, Jose M. [1 ]
机构
[1] Univ Castilla La Mancha, Intelligent Syst & Data Min Lab I3A, Dept Comp Syst, Albacete 02071, Spain
关键词
Feature subset selection; High-dimensional datasets; Wrapper algorithms; Filter measures; Complexity; Rank-based algorithms; Re-ranking; STATISTICAL COMPARISONS; MUTUAL INFORMATION; CLASSIFIERS;
D O I
10.1016/j.knosys.2011.01.015
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper deals with the problem of supervised wrapper-based feature subset selection in datasets with a very large number of attributes. Recently the literature has contained numerous references to the use of hybrid selection algorithms: based on a filter ranking, they perform an incremental wrapper selection over that ranking. Though working fine, these methods still have their problems: (1) depending on the complexity of the wrapper search method, the number of wrapper evaluations can still be too large; and (2) they rely on a univariate ranking that does not take into account interaction between the variables already included in the selected subset and the remaining ones. Here we propose a new approach whose main goal is to drastically reduce the number of wrapper evaluations while maintaining good performance (e.g. accuracy and size of the obtained subset). To do this we propose an algorithm that iteratively alternates between filter ranking construction and wrapper feature subset selection (FSS). Thus, the FSS only uses the first block of ranked attributes and the ranking method uses the current selected subset in order to build a new ranking where this knowledge is considered. The algorithm terminates when no new attribute is selected in the last call to the FSS algorithm. The main advantage of this approach is that only a few blocks of variables are analyzed, and so the number of wrapper evaluations decreases drastically. The proposed method is tested over eleven high-dimensional datasets (2400-46,000 variables) using different classifiers. The results show an impressive reduction in the number of wrapper evaluations without degrading the quality of the obtained subset. (C) 2011 Elsevier B.V. All rights reserved.
引用
收藏
页码:35 / 44
页数:10
相关论文
共 36 条
[1]  
AHA DW, 1991, MACH LEARN, V6, P37, DOI 10.1007/BF00153759
[2]  
[Anonymous], 1998, FEATURE EXTRACTION C
[3]   USING MUTUAL INFORMATION FOR SELECTING FEATURES IN SUPERVISED NEURAL-NET LEARNING [J].
BATTITI, R .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (04) :537-550
[4]  
Bermejo J.P.P., 2008, INCREMENTAL WRAPPER
[5]  
Bermejo P., 2010, THESIS U CASTILLA LA
[6]   Incremental Wrapper-based Subset Selection with Replacement: an advantageous alternative to sequential forward selection [J].
Bermejo, Pablo ;
Gamez, Jose A. ;
Puerta, Jose M. .
2009 IEEE SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DATA MINING, 2009, :367-374
[7]  
Demsar J, 2006, J MACH LEARN RES, V7, P1
[8]  
Duda R. O., 1973, Pattern Classification and Scene Analysis, V3
[9]   A filter model for feature subset selection based on genetic algorithm [J].
Elalami, M. E. .
KNOWLEDGE-BASED SYSTEMS, 2009, 22 (05) :356-362
[10]  
Fleuret F, 2004, J MACH LEARN RES, V5, P1531