Recent advances in feature selection and its applications

被引:269
作者
Li, Yun [1 ,2 ]
Li, Tao [1 ,2 ,3 ]
Liu, Huan [4 ]
机构
[1] Nanjing Univ Posts & Telecommun, Sch Comp Sci & Technol, Nanjing, Jiangsu, Peoples R China
[2] Nanjing Univ Posts & Telecommun, Jiangsu Key Lab Big Data Secur & Intelligent Proc, Nanjing, Jiangsu, Peoples R China
[3] Florida Int Univ, Sch Comp Sci, Miami, FL 33199 USA
[4] Arizona State Univ, Sch Comp Informat & Decis Syst Engn, Tempe, AZ USA
关键词
Feature selection; Survey; Data mining; ONLINE FEATURE-SELECTION; GENE SELECTION; CLASSIFICATION; CANCER; REGRESSION; RELEVANCE; SECURITY;
D O I
10.1007/s10115-017-1059-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Feature selection is one of the key problems for machine learning and data mining. In this review paper, a brief historical background of the field is given, followed by a selection of challenges which are of particular current interests, such as feature selection for high-dimensional small sample size data, large-scale data, and secure feature selection. Along with these challenges, some hot topics for feature selection have emerged, e.g., stable feature selection, multi-view feature selection, distributed feature selection, multi-label feature selection, online feature selection, and adversarial feature selection. Then, the recent advances of these topics are surveyed in this paper. For each topic, the existing problems are analyzed, and then, current solutions to these problems are presented and discussed. Besides the topics, some representative applications of feature selection are also introduced, such as applications in bioinformatics, social media, and multimedia retrieval.
引用
收藏
页码:551 / 577
页数:27
相关论文
共 104 条
  • [51] Stable feature selection for biomarker discovery
    He, Zengyou
    Yu, Weichuan
    [J]. COMPUTATIONAL BIOLOGY AND CHEMISTRY, 2010, 34 (04) : 215 - 225
  • [52] Reducing the dimensionality of data with neural networks
    Hinton, G. E.
    Salakhutdinov, R. R.
    [J]. SCIENCE, 2006, 313 (5786) : 504 - 507
  • [53] ON MEAN ACCURACY OF STATISTICAL PATTERN RECOGNIZERS
    HUGHES, GF
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1968, 14 (01) : 55 - +
  • [54] Filter versus wrapper gene selection approaches in DNA microarray domains
    Inza, I
    Larrañaga, P
    Blanco, R
    Cerrolaza, AJ
    [J]. ARTIFICIAL INTELLIGENCE IN MEDICINE, 2004, 31 (02) : 91 - 103
  • [55] Similarity-based online feature selection in content-based image retrieval
    Jiang, W
    Er, G
    Dai, QH
    Gu, JW
    [J]. IEEE TRANSACTIONS ON IMAGE PROCESSING, 2006, 15 (03) : 702 - 712
  • [56] Jun Yan, 2005, SIGIR 2005. Proceedings of the Twenty-Eighth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P122, DOI 10.1145/1076034.1076058
  • [57] A comparative study of iterative and non-iterative feature selection techniques for software defect prediction
    Khoshgoftaar, Taghi M.
    Gao, Kehan
    Napolitano, Amri
    Wald, Randall
    [J]. INFORMATION SYSTEMS FRONTIERS, 2014, 16 (05) : 801 - 822
  • [58] Wrappers for feature subset selection
    Kohavi, R
    John, GH
    [J]. ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) : 273 - 324
  • [59] gMLC: a multi-label feature selection framework for graph classification
    Kong, Xiangnan
    Yu, Philip S.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2012, 31 (02) : 281 - 305
  • [60] Lee J, 2010, PROCEEDINGS OF THE 17TH INTERNATIONAL CONGRESS ON SOUND AND VIBRATION