An effective and efficient approach to classification with incomplete data

被引:26
作者
Cao Truong Tran [1 ,2 ]
Zhang, Mengjie [1 ]
Andreae, Peter [1 ]
Xue, Bing [1 ]
Lam Thu Bui [2 ]
机构
[1] Victoria Univ Wellington, Sch Engn & Comp Sci, POB 600, Wellington 6140, New Zealand
[2] Le Quy Don Tech Univ, Res Grp Computat Intelligence, 236 Hoang Quoc Viet St, Hanoi, Vietnam
关键词
Incomplete data; Missing data; Classification; Imputation; Feature selection; Ensemble learning; FEATURE-SELECTION; MISSING VALUES; MUTUAL INFORMATION; IMPUTATION METHODS; ENSEMBLE; IMPACT;
D O I
10.1016/j.knosys.2018.05.013
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Many real-world datasets suffer from the unavoidable issue of missing values. Classification with incomplete data has to be carefully handled because inadequate treatment of missing values will cause large classification errors. Using imputation to transform incomplete data into complete data is a common approach to classification with incomplete data. However, simple imputation methods are often not accurate, and powerful imputation methods are usually computationally intensive. A recent approach to handling incomplete data constructs an ensemble of classifiers, each tailored to a known pattern of missing data. The main advantage of this approach is that it can classify new incomplete instances without requiring any imputation. This paper proposes an improvement on the ensemble approach by integrating imputation and genetic-based feature selection. The imputation creates higher quality training data. The feature selection reduces the number of missing patterns which increases the speed of classification, and greatly increases the fraction of new instances that can be classified by the ensemble. The results of experiments show that the proposed method is more accurate, and faster than previous common methods for classification with incomplete data.
引用
收藏
页码:1 / 16
页数:16
相关论文
共 48 条
[1]  
Acuña E, 2004, ST CLASS DAT ANAL, P639
[2]  
Aha D., 1980, TECHNICAL REPORT
[3]   Automatically Evolving Rotation-Invariant Texture Image Descriptors by Genetic Programming [J].
Al-Sahaf, Harith ;
Al-Sahaf, Ausama ;
Xue, Bing ;
johnston, Mark ;
Zhang, Mengjie .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2017, 21 (01) :83-101
[4]  
[Anonymous], 2009, SIGKDD Explorations, DOI DOI 10.1145/1656274.1656278
[5]  
[Anonymous], 1993, MORGAN KAUFMANN SERI
[6]  
[Anonymous], 2014, STAT ANAL MISSING DA
[7]  
Batista GEAPA, 2003, APPL ARTIF INTELL, V17, P519, DOI 10.1080/08839510390219309
[8]   Multiple Imputation and Genetic Programming for Classification with Incomplete Data [J].
Cao Truong Tran ;
Zhang, Mengjie ;
Andreae, Peter ;
Xue, Bing .
PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'17), 2017, :521-528
[9]   Genetic Programming based Feature Construction for Classification with Incomplete Data [J].
Cao Truong Tran ;
Zhang, Mengjie ;
Andreae, Peter ;
Xue, Bing .
PROCEEDINGS OF THE 2017 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'17), 2017, :1033-1040
[10]   Improving performance for classification with incomplete data using wrapper-based feature selection [J].
Tran C.T. ;
Zhang M. ;
Andreae P. ;
Xue B. .
Evolutionary Intelligence, 2016, 9 (03) :81-94