A Genetic Programming approach for feature selection in highly dimensional skewed data

被引:59
作者
Viegas, Felipe [2 ]
Rocha, Leonardo [1 ]
Goncalves, Marcos [2 ]
Mourao, Fernando [1 ]
Sa, Giovanni [1 ]
Salles, Thiago [2 ]
Andrade, Guilherme [2 ]
Sandin, Isac [1 ]
机构
[1] Univ Fed Sao Joao del Rei, Dept Comp Sci, Sao Joao Del Rei, MG, Brazil
[2] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil
关键词
Feature selection; Classification; Genetic Programming; CLASSIFICATION;
D O I
10.1016/j.neucom.2017.08.050
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High dimensionality, also known as the curse of dimensionality, is still a major challenge for automatic classification solutions. Accordingly, several feature selection (FS) strategies have been proposed for dimensionality reduction over the years. However, they potentially perform poorly in face of unbalanced data. In this work, we propose a novel feature selection strategy based on Genetic Programming, which is resilient to data skewness issues, in other words, it works well with both, balanced and unbalanced data. The proposed strategy aims at combining the most discriminative feature sets selected by distinct feature selection metrics in order to obtain a more effective and impartial set of the most discriminative features, departing from the hypothesis that distinct feature selection metrics produce different (and potentially complementary) feature space projections. We evaluated our proposal in biological and textual datasets. Our experimental results show that our proposed solution not only increases the efficiency of the learning process, reducing up to 83% the size of the data space, but also significantly increases its effectiveness in some scenarios. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:554 / 569
页数:16
相关论文
共 50 条
  • [31] A filter feature selection for high-dimensional data
    Janane, Fatima Zahra
    Ouaderhman, Tayeb
    Chamlal, Hasna
    [J]. JOURNAL OF ALGORITHMS & COMPUTATIONAL TECHNOLOGY, 2023, 17
  • [32] Feature selection for speaker verification using genetic programming
    Loughran R.
    Agapitos A.
    Kattan A.
    Brabazon A.
    O’Neill M.
    [J]. Evolutionary Intelligence, 2017, 10 (1-2) : 1 - 21
  • [33] A genetic programming approach to feature selection and classification of instantaneous cognitive states
    Ramirez, Rafael
    Puiggros, Montserrat
    [J]. APPLICATIONS OF EVOLUTIONARY COMPUTING, PROCEEDINGS, 2007, 4448 : 311 - +
  • [34] Exploring SLUG: Feature Selection Using Genetic Algorithms and Genetic Programming
    Rodrigues N.M.
    Batista J.E.
    Cava W.L.
    Vanneschi L.
    Silva S.
    [J]. SN Computer Science, 5 (1)
  • [35] A binary-constrained Geometric Semantic Genetic Programming for feature selection purposes
    Papa, Joao Paulo
    Rosa, Gustavo Henrique
    Papa, Luciene Patrici
    [J]. PATTERN RECOGNITION LETTERS, 2017, 100 : 59 - 66
  • [36] Reusing Genetic Programming for Ensemble Selection in Classification of Unbalanced Data
    Bhowan, Urvesh
    Johnston, Mark
    Zhang, Mengjie
    Yao, Xin
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2014, 18 (06) : 893 - 908
  • [37] A Wrapper Feature Selection Approach to Classification with Missing Data
    Cao Truong Tran
    Zhang, Mengjie
    Andreae, Peter
    Xue, Bing
    [J]. APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2016, PT I, 2016, 9597 : 685 - 700
  • [38] Genetic Programming with Noise Sensitivity for Imputation Predictor Selection in Symbolic Regression with Incomplete Data
    Al-Helali, Baligh
    Chen, Qi
    Xue, Bing
    Zhang, Mengjie
    [J]. 2020 IEEE CONGRESS ON EVOLUTIONARY COMPUTATION (CEC), 2020,
  • [39] Improving feature ranking for biomarker discovery in proteomics mass spectrometry data using genetic programming
    Ahmed, Soha
    Zhang, Mengjie
    Peng, Lifeng
    [J]. CONNECTION SCIENCE, 2014, 26 (03) : 215 - 243
  • [40] Learning and Sharing: A Multitask Genetic Programming Approach to Image Feature Learning
    Bi, Ying
    Xue, Bing
    Zhang, Mengjie
    [J]. IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2022, 26 (02) : 218 - 232