A Genetic Programming approach for feature selection in highly dimensional skewed data

被引:59
作者
Viegas, Felipe [2 ]
Rocha, Leonardo [1 ]
Goncalves, Marcos [2 ]
Mourao, Fernando [1 ]
Sa, Giovanni [1 ]
Salles, Thiago [2 ]
Andrade, Guilherme [2 ]
Sandin, Isac [1 ]
机构
[1] Univ Fed Sao Joao del Rei, Dept Comp Sci, Sao Joao Del Rei, MG, Brazil
[2] Univ Fed Minas Gerais, Dept Comp Sci, Belo Horizonte, MG, Brazil
关键词
Feature selection; Classification; Genetic Programming; CLASSIFICATION;
D O I
10.1016/j.neucom.2017.08.050
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
High dimensionality, also known as the curse of dimensionality, is still a major challenge for automatic classification solutions. Accordingly, several feature selection (FS) strategies have been proposed for dimensionality reduction over the years. However, they potentially perform poorly in face of unbalanced data. In this work, we propose a novel feature selection strategy based on Genetic Programming, which is resilient to data skewness issues, in other words, it works well with both, balanced and unbalanced data. The proposed strategy aims at combining the most discriminative feature sets selected by distinct feature selection metrics in order to obtain a more effective and impartial set of the most discriminative features, departing from the hypothesis that distinct feature selection metrics produce different (and potentially complementary) feature space projections. We evaluated our proposal in biological and textual datasets. Our experimental results show that our proposed solution not only increases the efficiency of the learning process, reducing up to 83% the size of the data space, but also significantly increases its effectiveness in some scenarios. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:554 / 569
页数:16
相关论文
共 50 条
[41]   Classifier design with feature selection and feature extraction using layered genetic programming [J].
Lin, Jung-Yi ;
Ke, Hao-Ren ;
Chien, Been-Chian ;
Yang, Wei-Pang .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (02) :1384-1393
[42]   Multi-objective genetic programming for feature extraction and data visualization [J].
Cano, Alberto ;
Ventura, Sebastian ;
Cios, Krzysztof J. .
SOFT COMPUTING, 2017, 21 (08) :2069-2089
[43]   Genetic Programming for Feature Selection and Feature Construction in Skin Cancer Image Classification [J].
Ul Ain, Qurrat ;
Xue, Bing ;
Al-Sahaf, Harith ;
Zhang, Mengjie .
PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2018, 11012 :732-745
[44]   Multi-objective genetic programming for feature extraction and data visualization [J].
Alberto Cano ;
Sebastián Ventura ;
Krzysztof J. Cios .
Soft Computing, 2017, 21 :2069-2089
[45]   A feature selection approach combining neural networks with genetic algorithms [J].
Huang, Zhi .
AI COMMUNICATIONS, 2019, 32 (5-6) :361-372
[46]   An Algorithm for Cross-Dependent Feature Selection of Genetic Data [J].
Zhang L. .
Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2022, 51 (05) :754-759
[47]   On the scalability of feature selection methods on high-dimensional data [J].
Bolon-Canedo, V. ;
Rego-Fernandez, D. ;
Peteiro-Barral, D. ;
Alonso-Betanzos, A. ;
Guijarro-Berdinas, B. ;
Sanchez-Marono, N. .
KNOWLEDGE AND INFORMATION SYSTEMS, 2018, 56 (02) :395-442
[48]   Analysis of high dimensional data using feature selection models [J].
Mahajan, Shubham ;
Pandit, Amit Kant .
INTERNATIONAL JOURNAL OF NANOTECHNOLOGY, 2023, 20 (1-4) :116-128
[49]   Unsupervised spectral feature selection algorithms for high dimensional data [J].
Wang, Mingzhao ;
Han, Henry ;
Huang, Zhao ;
Xie, Juanying .
FRONTIERS OF COMPUTER SCIENCE, 2023, 17 (05)
[50]   Simultaneous Feature and Model Selection for High-Dimensional Data [J].
Perolini, Alessandro ;
Guerif, Sebastien .
2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011), 2011, :47-50