A review of feature selection methods on synthetic data

被引:536
|
作者
Bolon-Canedo, Veronica [1 ]
Sanchez-Marono, Noelia [1 ]
Alonso-Betanzos, Amparo [1 ]
机构
[1] Univ A Coruna, Dept Comp Sci, La Coruna, Spain
关键词
Feature selection; Filters; Embedded methods; Wrappers; Synthetic datasets; EFFICIENT FEATURE-SELECTION; MUTUAL INFORMATION; GENE SELECTION; CLASSIFICATION; ALGORITHMS; RELEVANCE; RELIEFF; SEARCH;
D O I
10.1007/s10115-012-0487-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the advent of high dimensionality, adequate identification of relevant features of the data has become indispensable in real-world scenarios. In this context, the importance of feature selection is beyond doubt and different methods have been developed. However, with such a vast body of algorithms available, choosing the adequate feature selection method is not an easy-to-solve question and it is necessary to check their effectiveness on different situations. Nevertheless, the assessment of relevant features is difficult in real datasets and so an interesting option is to use artificial data. In this paper, several synthetic datasets are employed for this purpose, aiming at reviewing the performance of feature selection methods in the presence of a crescent number or irrelevant features, noise in the data, redundancy and interaction between attributes, as well as a small ratio between number of samples and number of features. Seven filters, two embedded methods, and two wrappers are applied over eleven synthetic datasets, tested by four classifiers, so as to be able to choose a robust method, paving the way for its application to real datasets.
引用
收藏
页码:483 / 519
页数:37
相关论文
共 50 条
  • [21] Feature Selection Methods for Linked Data Limitations, Capabilities and Potentials
    Cherrington, Marianne
    Lu, Joan
    Airehrour, David
    Xu, Qiang
    Madanian, Samaneh
    Wade, Stephen
    BDCAT'19: PROCEEDINGS OF THE 6TH IEEE/ACM INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING, APPLICATIONS AND TECHNOLOGIES, 2019, : 103 - 112
  • [22] Filter-Based Feature Selection Methods for Industrial Sensor Data: A Review
    Luftensteiner, Sabrina
    Mayr, Michael
    Chasparis, Georgios
    BIG DATA ANALYTICS AND KNOWLEDGE DISCOVERY (DAWAK 2021), 2021, 12925 : 242 - 249
  • [23] On the scalability of feature selection methods on high-dimensional data
    Bolon-Canedo, V.
    Rego-Fernandez, D.
    Peteiro-Barral, D.
    Alonso-Betanzos, A.
    Guijarro-Berdinas, B.
    Sanchez-Marono, N.
    KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 56 (02) : 395 - 442
  • [24] An evaluation of feature selection methods for environmental data
    Effrosynidis, Dimitrios
    Arampatzis, Avi
    ECOLOGICAL INFORMATICS, 2021, 61
  • [25] Feature selection methods and genomic big data: a systematic review
    Khawla Tadist
    Said Najah
    Nikola S. Nikolov
    Fatiha Mrabti
    Azeddine Zahi
    Journal of Big Data, 6
  • [26] Review on Feature Selection Methods for Gene Expression Data Classification
    Almutiri, Talal
    Saeed, Faisal
    EMERGING TRENDS IN INTELLIGENT COMPUTING AND INFORMATICS: DATA SCIENCE, INTELLIGENT INFORMATION SYSTEMS AND SMART COMPUTING, 2020, 1073 : 24 - 34
  • [27] Benchmark of filter methods for feature selection in high-dimensional gene expression survival data
    Bommert, Andrea
    Welchowski, Thomas
    Schmid, Matthias
    Rahnenfuehrer, Joerg
    BRIEFINGS IN BIOINFORMATICS, 2022, 23 (01)
  • [28] Hybrid fast unsupervised feature selection for high-dimensional data
    Manbari, Zhaleh
    AkhlaghianTab, Fardin
    Salavati, Chiman
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 124 : 97 - 118
  • [29] Feature Selection Using Information Distance Measure for Gene Expression Data
    Cai, Jie
    Liang, Cheng
    Luo, Jiawei
    CURRENT PROTEOMICS, 2018, 15 (05) : 352 - 362
  • [30] Approaches to Multi-Objective Feature Selection: A Systematic Literature Review
    Al-Tashi, Qasem
    Abdulkadir, Said Jadid
    Rais, Helmi Md
    Mirjalili, Seyedali
    Alhussian, Hitham
    IEEE ACCESS, 2020, 8 : 125076 - 125096