A Genetic Programming approach for feature selection in highly dimensional skewed data

被引:61
作者
Viegas, Felipe [2 ]
Rocha, Leonardo [1 ]
Goncalves, Marcos [2 ]
Mourao, Fernando [1 ]
Sa, Giovanni [1 ]
Salles, Thiago [2 ]
Andrade, Guilherme [2 ]
Sandin, Isac [1 ]
机构
[1] Univ Fed Sao Joao del Rei, Dept Comp Sci, Sao Joao Del Rei, MG, Brazil
[2] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil
关键词
Feature selection; Classification; Genetic Programming; CLASSIFICATION;
D O I
10.1016/j.neucom.2017.08.050
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High dimensionality, also known as the curse of dimensionality, is still a major challenge for automatic classification solutions. Accordingly, several feature selection (FS) strategies have been proposed for dimensionality reduction over the years. However, they potentially perform poorly in face of unbalanced data. In this work, we propose a novel feature selection strategy based on Genetic Programming, which is resilient to data skewness issues, in other words, it works well with both, balanced and unbalanced data. The proposed strategy aims at combining the most discriminative feature sets selected by distinct feature selection metrics in order to obtain a more effective and impartial set of the most discriminative features, departing from the hypothesis that distinct feature selection metrics produce different (and potentially complementary) feature space projections. We evaluated our proposal in biological and textual datasets. Our experimental results show that our proposed solution not only increases the efficiency of the learning process, reducing up to 83% the size of the data space, but also significantly increases its effectiveness in some scenarios. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:554 / 569
页数:16
相关论文
共 50 条
[21]   A New Approach for Wrapper Feature Selection Using Genetic Algorithm for Big Data [J].
Bouaguel, Waad .
INTELLIGENT AND EVOLUTIONARY SYSTEMS, IES 2015, 2016, 5 :75-83
[22]   PhysicsGP: A genetic programming approach to event selection [J].
Cranmer, K ;
Bowman, RS .
COMPUTER PHYSICS COMMUNICATIONS, 2005, 167 (03) :165-176
[23]   Genetic Programming Representations for Multi-dimensional Feature Learning in Biomedical Classification [J].
La Cava, William ;
Silva, Sara ;
Vanneschi, Leonardo ;
Spector, Lee ;
Moore, Jason .
APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2017, PT I, 2017, 10199 :158-173
[24]   Evolutionary feature selection on high dimensional data using a search space reduction approach [J].
Garcia-Torres, Miguel ;
Ruiz, Roberto ;
Divina, Federico .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2023, 117
[25]   SLUG: Feature Selection Using Genetic Algorithms and Genetic Programming [J].
Rodrigues, Nuno M. ;
Batista, Joao E. ;
La Cava, William ;
Vanneschi, Leonardo ;
Silva, Sara .
GENETIC PROGRAMMING (EUROGP 2022), 2022, :68-84
[26]   Feature selection and classification of metabolomics data using artificial bee colony programming (ABCP) [J].
Ozturk, Celal ;
Tarim, Mustafa ;
Arslan, Sibel .
INTERNATIONAL JOURNAL OF DATA MINING AND BIOINFORMATICS, 2020, 23 (02) :101-118
[27]   A Novel Genetic Algorithm Approach to Simultaneous Feature Selection and Instance Selection [J].
Albuquerque, Inti Mateus Resende ;
Bach Hoai Nguyen ;
Xue, Bing ;
Zhang, Mengjie .
2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, :616-623
[28]   Genetic Programming based Feature Construction for Classification with Incomplete Data [J].
Cao Truong Tran ;
Zhang, Mengjie ;
Andreae, Peter ;
Xue, Bing .
PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'17), 2017, :1033-1040
[29]   Genetic Programming for Feature Selection and Feature Combination in Salient Object Detection [J].
Afzali, Shima ;
Al-Sahaf, Harith ;
Xue, Bing ;
Hollitt, Christopher ;
Zhang, Mengjie .
APPLICATIONS OF EVOLUTIONARY COMPUTATION, EVOAPPLICATIONS 2019, 2019, 11454 :308-324
[30]   Towards Ultrahigh Dimensional Feature Selection for Big Data [J].
Tan, Mingkui ;
Tsang, Ivor W. ;
Wang, Li .
JOURNAL OF MACHINE LEARNING RESEARCH, 2014, 15 :1371-1429