Genetic programming for feature construction and selection in classification on high-dimensional data

被引:136
作者
Binh Tran [1 ]
Xue, Bing [1 ]
Zhang, Mengjie [1 ]
机构
[1] Victoria Univ Wellington, Evolutionary Computat Res Grp, POB 600, Wellington 6140, New Zealand
关键词
Genetic programming; Feature construction; Feature selection; Classification; High-dimensional data; ALGORITHM; OPTIMIZATION; CLASSIFIERS;
D O I
10.1007/s12293-015-0173-y
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Classification on high-dimensional data with thousands to tens of thousands of dimensions is a challenging task due to the high dimensionality and the quality of the feature set. The problem can be addressed by using feature selection to choose only informative features or feature construction to create new high-level features. Genetic programming (GP) using a tree-based representation can be used for both feature construction and implicit feature selection. This work presents a comprehensive study to investigate the use of GP for feature construction and selection on high-dimensional classification problems. Different combinations of the constructed and/or selected features are tested and compared on seven high-dimensional gene expression problems, and different classification algorithms are used to evaluate their performance. The results show that the constructed and/or selected feature sets can significantly reduce the dimensionality and maintain or even increase the classification accuracy in most cases. The cases with overfitting occurred are analysed via the distribution of features. Further analysis is also performed to show why the constructed feature can achieve promising classification performance.
引用
收藏
页码:3 / 15
页数:13
相关论文
共 33 条
[11]   A Survey on the Application of Genetic Programming to Classification [J].
Espejo, Pedro G. ;
Ventura, Sebastian ;
Herrera, Francisco .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2010, 40 (02) :121-144
[12]   GPPE:: a method to generate ad-hoc feature extractors for prediction in financial domains [J].
Estebanez, Cesar ;
Valls, Jose M. ;
Aler, Ricardo .
APPLIED INTELLIGENCE, 2008, 29 (02) :174-185
[13]   Breast cancer diagnosis using genetic programming generated feature [J].
Guo, H ;
Nandi, AK .
PATTERN RECOGNITION, 2006, 39 (05) :980-987
[14]   Automatic feature extraction using genetic programming: An application to epileptic EEG classification [J].
Guo, Ling ;
Rivero, Daniel ;
Dorado, Julian ;
Munteanu, Cristian R. ;
Pazos, Alejandro .
EXPERT SYSTEMS WITH APPLICATIONS, 2011, 38 (08) :10425-10436
[15]  
Hall M., 2009, SIGKDD EXPLOR, V11, P931
[16]   Wrappers for feature subset selection [J].
Kohavi, R ;
John, GH .
ARTIFICIAL INTELLIGENCE, 1997, 97 (1-2) :273-324
[17]   Genetic Programming-based Construction of Features for Machine Learning and Knowledge Discovery Tasks [J].
Krzysztof Krawiec .
Genetic Programming and Evolvable Machines, 2002, 3 (4) :329-343
[18]   Genetic programming for mining DNA chip data from cancer patients [J].
Langdon W.B. ;
Buxton B.F. .
Genetic Programming and Evolvable Machines, 2004, 5 (03) :251-257
[19]   Evolutionary feature synthesis for object recognition [J].
Lin, YQ ;
Bhanu, B .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2005, 35 (02) :156-171
[20]   Evolving Classifiers to Recognize the Movement Characteristics of Parkinson's Disease Patients [J].
Lones, Michael A. ;
Smith, Stephen L. ;
Alty, Jane E. ;
Lacy, Stuart E. ;
Possin, Katherine L. ;
Jamieson, D. R. Stuart ;
Tyrrell, Andy M. .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2014, 18 (04) :559-576