Feature selection techniques for machine learning: a survey of more than two decades of research

被引:67
作者
Theng, Dipti [1 ]
Bhoyar, Kishor K. [2 ]
机构
[1] YCCE, Nagpur, Maharashtra, India
[2] YCCE, Dept Informat Technol, Nagpur, Maharashtra, India
基金
英国科研创新办公室;
关键词
Feature selection; Machine learning; High-dimensional data; Filter techniques; Wrapper techniques; Embedded techniques; REGRESSION; FRAMEWORK; REDUCTION; STABILITY; NETWORK;
D O I
10.1007/s10115-023-02010-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Learning algorithms can be less effective on datasets with an extensive feature space due to the presence of irrelevant and redundant features. Feature selection is a technique that effectively reduces the dimensionality of the feature space by eliminating irrelevant and redundant features without significantly affecting the quality of decision-making of the trained model. In the last few decades, numerous algorithms have been developed to identify the most significant features for specific learning tasks. Each algorithm has its advantages and disadvantages, and it is the responsibility of a data scientist to determine the suitability of a specific algorithm for a particular task. However, with the availability of a vast number of feature selection algorithms, selecting the appropriate one can be a daunting task for an expert. These challenges in feature selection have motivated us to analyze the properties of algorithms and dataset characteristics together. This paper presents significant efforts to review existing feature selection algorithms, providing an exhaustive analysis of their properties and relative performance. It also addresses the evolution, formulation, and usefulness of these algorithms. The manuscript further categorizes the algorithms analyzed in this review based on the properties required for a specific dataset and objective under study. Additionally, it discusses popular area-specific feature selection techniques. Finally, it identifies and discusses some open research challenges in feature selection that are yet to be overcome.
引用
收藏
页码:1575 / 1637
页数:63
相关论文
共 109 条
  • [1] A hybrid Harris Hawks optimization algorithm with simulated annealing for feature selection
    Abdel-Basset, Mohamed
    Ding, Weiping
    El-Shahat, Doaa
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (01) : 593 - 637
  • [2] Approaches to Multi-Objective Feature Selection: A Systematic Literature Review
    Al-Tashi, Qasem
    Abdulkadir, Said Jadid
    Rais, Helmi Md
    Mirjalili, Seyedali
    Alhussian, Hitham
    [J]. IEEE ACCESS, 2020, 8 : 125076 - 125096
  • [3] Alelyani S, 2011, 2011 IEEE INT C HIGH, P701
  • [4] Almuallim H, 1992, 9 CAN C ART INT
  • [5] Streaming feature selection algorithms for big data: A survey
    AlNuaimi, Noura
    Masud, Mohammad Mehedy
    Serhani, Mohamed Adel
    Zaki, Nazar
    [J]. APPLIED COMPUTING AND INFORMATICS, 2022, 18 (1/2) : 113 - 135
  • [6] Supervised, Unsupervised, and Semi-Supervised Feature Selection: A Review on Gene Selection
    Ang, Jun Chin
    Mirzal, Andri
    Haron, Habibollah
    Hamed, Haza Nuzly Abdull
    [J]. IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (05) : 971 - 989
  • [7] An Innovative Multi-Model Neural Network Approach for Feature Selection in Emotion Recognition Using Deep Feature Clustering
    Asghar, Muhammad Adeel
    Khan, Muhammad Jamil
    Rizwan, Muhammad
    Mehmood, Raja Majid
    Kim, Sun-Hee
    [J]. SENSORS, 2020, 20 (13) : 1 - 21
  • [8] On the scalability of feature selection methods on high-dimensional data
    Bolon-Canedo, V.
    Rego-Fernandez, D.
    Peteiro-Barral, D.
    Alonso-Betanzos, A.
    Guijarro-Berdinas, B.
    Sanchez-Marono, N.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 56 (02) : 395 - 442
  • [9] Ensembles for feature selection: A review and future trends
    Bolon-Canedo, Veronica
    Alonso-Betanzos, Amparo
    [J]. INFORMATION FUSION, 2019, 52 : 1 - 12
  • [10] BolonCanedo V, 2018, INTEL SYST REF LIBR, V147, P1, DOI 10.1007/978-3-319-90080-3