A review of feature selection methods on synthetic data

被引:536
|
作者
Bolon-Canedo, Veronica [1 ]
Sanchez-Marono, Noelia [1 ]
Alonso-Betanzos, Amparo [1 ]
机构
[1] Univ A Coruna, Dept Comp Sci, La Coruna, Spain
关键词
Feature selection; Filters; Embedded methods; Wrappers; Synthetic datasets; EFFICIENT FEATURE-SELECTION; MUTUAL INFORMATION; GENE SELECTION; CLASSIFICATION; ALGORITHMS; RELEVANCE; RELIEFF; SEARCH;
D O I
10.1007/s10115-012-0487-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the advent of high dimensionality, adequate identification of relevant features of the data has become indispensable in real-world scenarios. In this context, the importance of feature selection is beyond doubt and different methods have been developed. However, with such a vast body of algorithms available, choosing the adequate feature selection method is not an easy-to-solve question and it is necessary to check their effectiveness on different situations. Nevertheless, the assessment of relevant features is difficult in real datasets and so an interesting option is to use artificial data. In this paper, several synthetic datasets are employed for this purpose, aiming at reviewing the performance of feature selection methods in the presence of a crescent number or irrelevant features, noise in the data, redundancy and interaction between attributes, as well as a small ratio between number of samples and number of features. Seven filters, two embedded methods, and two wrappers are applied over eleven synthetic datasets, tested by four classifiers, so as to be able to choose a robust method, paving the way for its application to real datasets.
引用
收藏
页码:483 / 519
页数:37
相关论文
共 50 条
  • [41] Feature Selection Using a Neural Framework With Controlled Redundancy
    Chakraborty, Rudrasis
    Pal, Nikhil R.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2015, 26 (01) : 35 - 50
  • [42] Benchmarking relief-based feature selection methods for bioinformatics data mining
    Urbanowicz, Ryan J.
    Olson, Randal S.
    Schmit, Peter
    Meeker, Melissa
    Moore, Jason H.
    JOURNAL OF BIOMEDICAL INFORMATICS, 2018, 85 : 168 - 188
  • [43] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
  • [44] Feature Selection and Classification in gene expression cancer data
    Pavithra, D.
    Lakshmanan, B.
    2017 INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE IN DATA SCIENCE (ICCIDS), 2017,
  • [45] Feature Extraction and Feature Selection Methods in Classification of Brain MRI Images: A Review
    Poernama, Aqidatul Izza
    Soesanti, Indah
    Wahyunggoro, Oyas
    2019 INTERNATIONAL BIOMEDICAL INSTRUMENTATION AND TECHNOLOGY CONFERENCE (IBITEC), 2019, : 58 - 63
  • [46] A hybrid feature selection scheme for mixed attributes data
    Liu, Haitao
    Wei, Ruxiang
    Jiang, Guoping
    COMPUTATIONAL & APPLIED MATHEMATICS, 2013, 32 (01) : 145 - 161
  • [47] Analysis and comparison of feature selection methods towards performance and stability
    Barbieri, Matheus Cezimbra
    Grisci, Bruno Iochins
    Dorn, Marcio
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 249
  • [48] A Review on Feature Selection Techniques for Gene Expression Data
    Vanjimalar, S.
    Ramyachitra, D.
    Manikandan, P.
    2018 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (IEEE ICCIC 2018), 2018, : 26 - 29
  • [49] Using cooperative game theory to optimize the feature selection problem
    Sun, Xin
    Liu, Yanheng
    Li, Jin
    Zhu, Jianqi
    Liu, Xuejie
    Chen, Huiling
    NEUROCOMPUTING, 2012, 97 : 86 - 93
  • [50] A Clustering Based Feature Selection Method Using Feature Information Distance for Text Data
    Chao, Shilong
    Cai, Jie
    Yang, Sheng
    Wang, Shulin
    INTELLIGENT COMPUTING THEORIES AND APPLICATION, ICIC 2016, PT I, 2016, 9771 : 122 - 132