A tutorial-based survey on feature selection: Recent advancements on feature selection

被引:22
作者
Moslemi, Amir [1 ]
机构
[1] Sunnybrook Hlth Sci Ctr, Imaging Res & Phys Sci, Toronto, ON M4N 3M5, Canada
关键词
Feature selection; Matrix factorization; Sparse representation learning; Information theory; Evolutionary computation; Reinforcement learning; UNSUPERVISED FEATURE-SELECTION; SUPERVISED FEATURE-SELECTION; NONNEGATIVE MATRIX FACTORIZATION; PARTICLE SWARM OPTIMIZATION; EFFICIENT FEATURE-SELECTION; SPARSE FEATURE-SELECTION; LABEL FEATURE-SELECTION; HESITANT FUZZY-SETS; GENETIC ALGORITHM; MUTUAL INFORMATION;
D O I
10.1016/j.engappai.2023.107136
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Curse of dimensionality is known as big challenges in data mining, pattern recognition, computer vison and machine learning in recent years. Feature selection and feature extraction are two main approaches to circumvent this challenge. The main objective in feature selection is to remove the redundant features and preserve the relevant features in order to improve the learning algorithm performance. This survey provides a comprehensive overview of state-of-art feature selection techniques including mathematical formulas and fundamental algorithm to facilitate understanding. This survey encompasses different approaches of feature selection which can be categorized to five domains including: A) subspace learning which involves matrix factorization and matrix projection, B) sparse representation learning which includes compressed sensing and dictionary learning, C) information theory which covers multi-label neighborhood entropy, symmetrical un-certainty, Monte Carlo and Markov blanket, D) evolutionary computational algorithms including Genetic algo-rithm (GA), particle swarm optimization (PSO), Ant colony (AC) and Grey wolf optimization (GWO), and E) reinforcement learning techniques. This survey can be helpful for researchers to acquire deep understanding of feature selection techniques and choose a proper feature selection technique. Moreover, researcher can choose one of the A, B, C, D and E domains to become deep in this field for future study. A potential avenue for future research could involve exploring methods to reduce computational complexity while simultaneously maintaining performance efficiency. This would involve investigating ways to achieve a more efficient balance between computational resources and overall performance. For matrix-based techniques, the main limitation of these techniques lies in the need to tune the coefficients of the regularization terms, as this process can be challenging and time-consuming. For evolutionary computational techniques, getting stuck in local minimum and finding an appropriate objective function are two main limitations.
引用
收藏
页数:28
相关论文
共 207 条
  • [1] Dimensionality reduction using singular vectors
    Afshar, Majid
    Usefi, Hamid
    [J]. SCIENTIFIC REPORTS, 2021, 11 (01)
  • [2] High-dimensional feature selection for genomic datasets
    Afshar, Majid
    Usefi, Hamid
    [J]. KNOWLEDGE-BASED SYSTEMS, 2020, 206
  • [3] Text feature selection using ant colony optimization
    Aghdam, Mehdi Hosseinzadeh
    Ghasem-Aghaee, Nasser
    Basiri, Mohammad Ehsan
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (03) : 6843 - 6853
  • [4] Binary Optimization Using Hybrid Grey Wolf Optimization for Feature Selection
    Al-Tashi, Qasem
    Kadir, Said Jadid Abdul
    Rais, Helmi Md
    Mirjalili, Seyedali
    Alhussian, Hitham
    [J]. IEEE ACCESS, 2019, 7 : 39496 - 39508
  • [5] Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling
    Alizadeh, AA
    Eisen, MB
    Davis, RE
    Ma, C
    Lossos, IS
    Rosenwald, A
    Boldrick, JG
    Sabet, H
    Tran, T
    Yu, X
    Powell, JI
    Yang, LM
    Marti, GE
    Moore, T
    Hudson, J
    Lu, LS
    Lewis, DB
    Tibshirani, R
    Sherlock, G
    Chan, WC
    Greiner, TC
    Weisenburger, DD
    Armitage, JO
    Warnke, R
    Levy, R
    Wilson, W
    Grever, MR
    Byrd, JC
    Botstein, D
    Brown, PO
    Staudt, LM
    [J]. NATURE, 2000, 403 (6769) : 503 - 511
  • [6] Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays
    Alon, U
    Barkai, N
    Notterman, DA
    Gish, K
    Ybarra, S
    Mack, D
    Levine, AJ
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 1999, 96 (12) : 6745 - 6750
  • [7] A framework for feature selection through boosting
    Alsahaf, Ahmad
    Petkov, Nicolai
    Shenoy, Vikram
    Azzopardi, George
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2022, 187
  • [8] A two-layer feature selection method using Genetic Algorithm and Elastic Net
    Amini, Fatemeh
    Hu, Guiping
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2021, 166
  • [9] Optimizing multi-objective PSO based feature selection method using a feature elitism mechanism
    Amoozegar, Maryam
    Minaei-Bidgoli, Behrouz
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2018, 113 : 499 - 514
  • [10] A Feature Selection based on perturbation theory
    Anaraki, Javad Rahimipour
    Usefi, Hamid
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2019, 127 : 1 - 8