A review of feature selection methods on synthetic data

被引:536
|
作者
Bolon-Canedo, Veronica [1 ]
Sanchez-Marono, Noelia [1 ]
Alonso-Betanzos, Amparo [1 ]
机构
[1] Univ A Coruna, Dept Comp Sci, La Coruna, Spain
关键词
Feature selection; Filters; Embedded methods; Wrappers; Synthetic datasets; EFFICIENT FEATURE-SELECTION; MUTUAL INFORMATION; GENE SELECTION; CLASSIFICATION; ALGORITHMS; RELEVANCE; RELIEFF; SEARCH;
D O I
10.1007/s10115-012-0487-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the advent of high dimensionality, adequate identification of relevant features of the data has become indispensable in real-world scenarios. In this context, the importance of feature selection is beyond doubt and different methods have been developed. However, with such a vast body of algorithms available, choosing the adequate feature selection method is not an easy-to-solve question and it is necessary to check their effectiveness on different situations. Nevertheless, the assessment of relevant features is difficult in real datasets and so an interesting option is to use artificial data. In this paper, several synthetic datasets are employed for this purpose, aiming at reviewing the performance of feature selection methods in the presence of a crescent number or irrelevant features, noise in the data, redundancy and interaction between attributes, as well as a small ratio between number of samples and number of features. Seven filters, two embedded methods, and two wrappers are applied over eleven synthetic datasets, tested by four classifiers, so as to be able to choose a robust method, paving the way for its application to real datasets.
引用
收藏
页码:483 / 519
页数:37
相关论文
共 50 条
  • [1] A review of feature selection methods on synthetic data
    Verónica Bolón-Canedo
    Noelia Sánchez-Maroño
    Amparo Alonso-Betanzos
    Knowledge and Information Systems, 2013, 34 : 483 - 519
  • [2] A survey on feature selection methods for mixed data
    Solorio-Fernandez, Saul
    Carrasco-Ochoa, J. Ariel
    Martinez-Trinidad, Jose Francisco
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 2821 - 2846
  • [3] A Review of Feature Selection and Its Methods
    Venkatesh, B.
    Anuradha, J.
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2019, 19 (01) : 3 - 26
  • [4] Application of feature selection methods for automated clustering analysis: a review on synthetic datasets
    Ahmad, Aliyu Usman
    Starkey, Andrew
    NEURAL COMPUTING & APPLICATIONS, 2018, 29 (07) : 317 - 328
  • [5] A review of feature selection methods based on mutual information
    Vergara, Jorge R.
    Estevez, Pablo A.
    NEURAL COMPUTING & APPLICATIONS, 2014, 24 (01) : 175 - 186
  • [6] XyGen: Synthetic data generator for feature selection
    Kamalov, Firuz
    Elnaffar, Said
    Sulieman, Hana
    Cherukuri, Aswani Kumar
    SOFTWARE IMPACTS, 2023, 15
  • [7] A review of feature selection methods in medical applications
    Remeseiro, Beatriz
    Bolon-Canedo, Veronica
    COMPUTERS IN BIOLOGY AND MEDICINE, 2019, 112
  • [8] Feature selection methods and genomic big data: a systematic review
    Tadist, Khawla
    Najah, Said
    Nikolov, Nikola S.
    Mrabti, Fatiha
    Zahi, Azeddine
    JOURNAL OF BIG DATA, 2019, 6 (01)
  • [9] Benchmark study of feature selection strategies for multi-omics data
    Li, Yingxia
    Mansmann, Ulrich
    Du, Shangming
    Hornung, Roman
    BMC BIOINFORMATICS, 2022, 23 (01)
  • [10] Feature selection methods on gene expression microarray data for cancer classification: A systematic review
    Alhenawi, Esra'a
    Al-Sayyed, Rizik
    Hudaib, Amjad
    Mirjalili, Seyedali
    COMPUTERS IN BIOLOGY AND MEDICINE, 2022, 140