Sequential random k-nearest neighbor feature selection for high-dimensional data

被引:74
作者
Park, Chan Hee [1 ]
Kim, Seoung Bum [1 ]
机构
[1] Korea Univ, Sch Ind Management Engn, Seoul, South Korea
基金
新加坡国家研究基金会;
关键词
Feature selection; High dimensionality; Ensemble; Wrapper; Random forest; k-NN; CLASSIFICATION; FOREST;
D O I
10.1016/j.eswa.2014.10.044
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection based on an ensemble classifier has been recognized as a crucial technique for modeling high-dimensional data. Feature selection based on the random forests model, which is constructed by aggregating multiple decision tree classifiers, has been widely used. However, a lack of stability and balance in decision trees decreases the robustness of random forests. This limitation motivated us to propose a feature selection method based on newly designed nearest-neighbor ensemble classifiers. The proposed method finds significant features by using an iterative procedure. We performed experiments with 20 datasets of microarray gene expressions to examine the property of the proposed method and compared it with random forests. The results demonstrated the effectiveness and robustness of the proposed method, especially when the number of features exceeds the number of observations. (C) 2014 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2336 / 2342
页数:7
相关论文
共 29 条
[1]   Support vector machines combined with feature selection for breast cancer diagnosis [J].
Akay, Mehmet Fatih .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (02) :3240-3247
[2]   A balanced iterative random forest for gene selection from microarray data [J].
Anaissi, Ali ;
Kennedy, Paul J. ;
Goyal, Madhu ;
Catchpoole, Daniel R. .
BMC BIOINFORMATICS, 2013, 14
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Estimating generalization error on two-class datasets using out-of-bag estimates [J].
Bylander, T .
MACHINE LEARNING, 2002, 48 (1-3) :287-297
[5]   Feature selection for text classification with Naive Bayes [J].
Chen, Jingnian ;
Huang, Houkuan ;
Tian, Shengfeng ;
Qu, Youli .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) :5432-5435
[6]  
Chen P, 2014, LECT NOTES COMPUT SC, V8481, P70, DOI 10.1007/978-3-319-07455-9_8
[7]   Data mining for yield enhancement in semiconductor manufacturing and an empirical study [J].
Chien, Chen-Fu ;
Wang, Wen-Chih ;
Cheng, Jen-Chieh .
EXPERT SYSTEMS WITH APPLICATIONS, 2007, 33 (01) :192-198
[8]   Gene selection and classification of microarray data using random forest -: art. no. 3 [J].
Díaz-Uriarte, R ;
de Andrés, SA .
BMC BIOINFORMATICS, 2006, 7 (1)
[9]   Research on collaborative negotiation for e-commerce. [J].
Feng, YQ ;
Lei, Y ;
Li, Y ;
Cao, RZ .
2003 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-5, PROCEEDINGS, 2003, :2085-2088
[10]   Variable selection using random forests [J].
Genuer, Robin ;
Poggi, Jean-Michel ;
Tuleau-Malot, Christine .
PATTERN RECOGNITION LETTERS, 2010, 31 (14) :2225-2236