Genetic Programming based Feature Construction for Classification with Incomplete Data

被引:6
作者
Cao Truong Tran [1 ]
Zhang, Mengjie [1 ]
Andreae, Peter [1 ]
Xue, Bing [1 ]
机构
[1] Victoria Univ Wellington, Sch Engn & Comp Sci, POB 600, Wellington 6140, New Zealand
来源
PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'17) | 2017年
关键词
incomplete data; feature construction; genetic programming; classification; MULTIPLE FEATURE CONSTRUCTION; MISSING VALUES; IMPUTATION; IMPACT;
D O I
10.1145/3071178.3071183
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Missing values are an unavoidable problem in many real-world datasets. Dealing with incomplete data is an crucial requirement for classification because inadequate treatment of missing values often causes large classification error. Feature construction has been successfully applied to improve classification with complete data, but it has been seldom applied to incomplete data. Genetic programming-based multiple feature construction (GPMFC) is a current encouraging feature construction method which uses genetic programming to evolve new multiple features from original features for classification tasks. GPMFC can improve the accuracy, and reduce the complexity of many decision trees and rule-based classifiers; however, it cannot directly work with incomplete data. This paper proposes IGPMFC which is extended from GPMFC to tackle with incomplete data. IGPMFC uses genetic programming with interval functions to directly evolve multiple features for classification with incomplete data. Experimental results reveal that not only IGPMFC can substantially improve the accuracy, but also can reduce the complexity of learnt classifiers facing with incomplete data.
引用
收藏
页码:1033 / 1040
页数:8
相关论文
共 32 条
[1]  
[Anonymous], 1993, MORGAN KAUFMANN SERI
[2]  
[Anonymous], 2014, STAT ANAL MISSING DA
[3]  
[Anonymous], 2011, J STAT SOFTWARE
[4]  
[Anonymous], 1992, GENETIC PROGRAMMING
[5]  
Berger JO, 2013, STAT DECISION THEORY
[6]   Fast Perceptron Decision Tree Learning from Evolving Data Streams [J].
Bifet, Albert ;
Holmes, Geoff ;
Pfahringer, Bernhard ;
Frank, Eibe .
ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PT II, PROCEEDINGS, 2010, 6119 :299-310
[7]  
Breiman F, 1984, OLSHEN STONE CLASSIF
[8]  
Tran CT, 2016, IEEE C EVOL COMPUTAT, P5278, DOI 10.1109/CEC.2016.7748361
[9]  
Tran T, 2015, IEEE C EVOL COMPUTAT, P2398, DOI 10.1109/CEC.2015.7257182
[10]   Directly Constructing Multiple Features for Classification with Missing Data using Genetic Programming with Interval Functions [J].
Cao Truong Tran ;
Zhang, Mengjie ;
Andreae, Peter ;
Xue, Bing .
PROCEEDINGS OF THE 2016 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'16 COMPANION), 2016, :69-70