A review of feature selection methods on synthetic data

被引:536
|
作者
Bolon-Canedo, Veronica [1 ]
Sanchez-Marono, Noelia [1 ]
Alonso-Betanzos, Amparo [1 ]
机构
[1] Univ A Coruna, Dept Comp Sci, La Coruna, Spain
关键词
Feature selection; Filters; Embedded methods; Wrappers; Synthetic datasets; EFFICIENT FEATURE-SELECTION; MUTUAL INFORMATION; GENE SELECTION; CLASSIFICATION; ALGORITHMS; RELEVANCE; RELIEFF; SEARCH;
D O I
10.1007/s10115-012-0487-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
With the advent of high dimensionality, adequate identification of relevant features of the data has become indispensable in real-world scenarios. In this context, the importance of feature selection is beyond doubt and different methods have been developed. However, with such a vast body of algorithms available, choosing the adequate feature selection method is not an easy-to-solve question and it is necessary to check their effectiveness on different situations. Nevertheless, the assessment of relevant features is difficult in real datasets and so an interesting option is to use artificial data. In this paper, several synthetic datasets are employed for this purpose, aiming at reviewing the performance of feature selection methods in the presence of a crescent number or irrelevant features, noise in the data, redundancy and interaction between attributes, as well as a small ratio between number of samples and number of features. Seven filters, two embedded methods, and two wrappers are applied over eleven synthetic datasets, tested by four classifiers, so as to be able to choose a robust method, paving the way for its application to real datasets.
引用
收藏
页码:483 / 519
页数:37
相关论文
共 50 条
  • [31] Application of feature selection methods for automated clustering analysis: a review on synthetic datasets
    Aliyu Usman Ahmad
    Andrew Starkey
    Neural Computing and Applications, 2018, 29 : 317 - 328
  • [32] Multinomial logistic regression-based feature selection for hyperspectral data
    Pal, Mahesh
    INTERNATIONAL JOURNAL OF APPLIED EARTH OBSERVATION AND GEOINFORMATION, 2012, 14 (01): : 214 - 220
  • [33] Feature selection using data envelopment analysis
    Zhang, Yishi
    Yang, Anrong
    Xiong, Chan
    Wang, Teng
    Zhang, Zigang
    KNOWLEDGE-BASED SYSTEMS, 2014, 64 : 70 - 80
  • [34] Feature evaluation and selection with cooperative game theory
    Sun, Xin
    Liu, Yanheng
    Li, Jin
    Zhu, Jianqi
    Chen, Huiling
    Liu, Xuejie
    PATTERN RECOGNITION, 2012, 45 (08) : 2992 - 3002
  • [35] Swarm Intelligence Algorithms for Feature Selection: A Review
    Brezocnik, Lucija
    Fister, Iztok, Jr.
    Podgorelec, Vili
    APPLIED SCIENCES-BASEL, 2018, 8 (09):
  • [36] Review on Feature Selection Methods in High Dimensional Domains
    Devika, U. K.
    Babu, Sheeba
    Kizhakkethottam, Jubilant J.
    PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON SOFT-COMPUTING AND NETWORKS SECURITY (ICSNS 2015), 2015,
  • [37] A New Approach for Feature Selection from Microarray Data Based on Mutual Information
    Tang, Jian
    Zhou, Shuigeng
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2016, 13 (06) : 1004 - 1015
  • [38] A Meta-Review of Feature Selection Techniques in the Context of Microarray Data
    Mungloo-Dilmohamud, Zahra
    Jaufeerally-Fakim, Yasmina
    Pena-Reyes, Carlos
    BIOINFORMATICS AND BIOMEDICAL ENGINEERING, IWBBIO 2017, PT I, 2017, 10208 : 33 - 49
  • [39] Benchmark for filter methods for feature selection in high-dimensional classification data
    Bommert, Andrea
    Sun, Xudong
    Bischl, Bernd
    Rahnenfuehrer, Joerg
    Lang, Michel
    COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2020, 143
  • [40] An Overview of Methods for Feature Selection Based on Mutual Information for Stream Data Classification
    Wankhade, Kapil
    Rane, Dhiraj
    Thool, Ravindra
    2013 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS AND NETWORK TECHNOLOGIES (CSNT 2013), 2013, : 630 - 634