Feature selection techniques for machine learning: a survey of more than two decades of research

被引:0
作者
Dipti Theng
Kishor K. Bhoyar
机构
[1] YCCE,Department of Information Technology
[2] YCCE,undefined
来源
Knowledge and Information Systems | 2024年 / 66卷
关键词
Feature selection; Machine learning; High-dimensional data; Filter techniques; Wrapper techniques; Embedded techniques;
D O I
暂无
中图分类号
学科分类号
摘要
Learning algorithms can be less effective on datasets with an extensive feature space due to the presence of irrelevant and redundant features. Feature selection is a technique that effectively reduces the dimensionality of the feature space by eliminating irrelevant and redundant features without significantly affecting the quality of decision-making of the trained model. In the last few decades, numerous algorithms have been developed to identify the most significant features for specific learning tasks. Each algorithm has its advantages and disadvantages, and it is the responsibility of a data scientist to determine the suitability of a specific algorithm for a particular task. However, with the availability of a vast number of feature selection algorithms, selecting the appropriate one can be a daunting task for an expert. These challenges in feature selection have motivated us to analyze the properties of algorithms and dataset characteristics together. This paper presents significant efforts to review existing feature selection algorithms, providing an exhaustive analysis of their properties and relative performance. It also addresses the evolution, formulation, and usefulness of these algorithms. The manuscript further categorizes the algorithms analyzed in this review based on the properties required for a specific dataset and objective under study. Additionally, it discusses popular area-specific feature selection techniques. Finally, it identifies and discusses some open research challenges in feature selection that are yet to be overcome.
引用
收藏
页码:1575 / 1637
页数:62
相关论文
共 273 条
  • [11] Rizwan M(2019)On resilient feature selection: computational foundations of rC-reducts Inf Sci 3 1157-4545
  • [12] Mehmood RM(2003)An introduction to variable and feature selection J Mach Learn Res 53 4519-69
  • [13] Kim SH(2020)A survey on feature selection approaches for clustering Artif Intell Rev 13 49-1963
  • [14] Bolón-Canedo V(2022)Ensemble of feature selection algorithms: a multi-criteria decision-making approach Int J Mach Learn Cybern 6 e28210-268
  • [15] Alonso-Betanzos A(2011)The influence of feature selection methods on accuracy, stability and interpretability of molecular signatures PloS ONE 14 1949-2069
  • [16] Brown G(2021)A group evaluation based binary PSO algorithm for feature selection in high dimensional data Evol Intel 16 253-79
  • [17] Pocock A(2018)Fast, accurate, and stable feature selection using neural networks Neuroinformatics 61 2055-989
  • [18] Zhao MJ(2020)Using heuristic and branch and bound methods to solve a multi-criteria machine scheduling problem Iraqi J Sci 300 70-36
  • [19] Luján M(2018)Feature selection in machine learning: a new perspective Neurocomputing 13 971-1119
  • [20] Budak H(2016)Supervised, unsupervised, and semi-supervised feature selection: a review on gene selection IEEE/ACM Trans Comput Biol Bioinform 52 1-423