Recent advances in feature selection and its applications

被引:290
作者
Li, Yun [1 ,2 ]
Li, Tao [1 ,2 ,3 ]
Liu, Huan [4 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing, Jiangsu, Peoples R China
[3] Florida Int Univ, Sch Comp Sci, Miami, FL 33199 USA
[4] Arizona State Univ, Sch Comp Informat & Decis Syst Engn, Tempe, AZ USA
关键词
Feature selection; Survey; Data mining; ONLINE FEATURE-SELECTION; GENE SELECTION; CLASSIFICATION; CANCER; REGRESSION; RELEVANCE; SECURITY;
D O I
10.1007/s10115-017-1059-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is one of the key problems for machine learning and data mining. In this review paper, a brief historical background of the field is given, followed by a selection of challenges which are of particular current interests, such as feature selection for high-dimensional small sample size data, large-scale data, and secure feature selection. Along with these challenges, some hot topics for feature selection have emerged, e.g., stable feature selection, multi-view feature selection, distributed feature selection, multi-label feature selection, online feature selection, and adversarial feature selection. Then, the recent advances of these topics are surveyed in this paper. For each topic, the existing problems are analyzed, and then, current solutions to these problems are presented and discussed. Besides the topics, some representative applications of feature selection are also introduced, such as applications in bioinformatics, social media, and multimedia retrieval.
引用
收藏
页码:551 / 577
页数:27
相关论文
共 104 条
[51]   Stable feature selection for biomarker discovery [J].
He, Zengyou ;
Yu, Weichuan .
COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2010, 34 (04) :215-225
[52]   Reducing the dimensionality of data with neural networks [J].
Hinton, G. E. ;
Salakhutdinov, R. R. .
SCIENCE, 2006, 313 (5786) :504-507
[53]   ON MEAN ACCURACY OF STATISTICAL PATTERN RECOGNIZERS [J].
HUGHES, GF .
IEEE TRANSACTIONS ON INFORMATION THEORY, 1968, 14 (01) :55-+
[54]   Filter versus wrapper gene selection approaches in DNA microarray domains [J].
Inza, I ;
Larrañaga, P ;
Blanco, R ;
Cerrolaza, AJ .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2004, 31 (02) :91-103
[55]   Similarity-based online feature selection in content-based image retrieval [J].
Jiang, W ;
Er, G ;
Dai, QH ;
Gu, JW .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (03) :702-712
[56]  
Jun Yan, 2005, SIGIR 2005. Proceedings of the Twenty-Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P122, DOI 10.1145/1076034.1076058
[57]   A comparative study of iterative and non-iterative feature selection techniques for software defect prediction [J].
Khoshgoftaar, Taghi M. ;
Gao, Kehan ;
Napolitano, Amri ;
Wald, Randall .
INFORMATION SYSTEMS FRONTIERS, 2014, 16 (05) :801-822
[58]   Wrappers for feature subset selection [J].
Kohavi, R ;
John, GH .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :273-324
[59]   gMLC: a multi-label feature selection framework for graph classification [J].
Kong, Xiangnan ;
Yu, Philip S. .
KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 31 (02) :281-305
[60]  
Lee J, 2010, PROCEEDINGS OF THE 17TH INTERNATIONAL CONGRESS ON SOUND AND VIBRATION