On the choice of the best imputation methods for missing values considering three groups of classification methods

被引:0
作者
Julián Luengo
Salvador García
Francisco Herrera
机构
[1] CITIC-University of Granada,Department of Computer Science and Artificial Intelligence
[2] University of Jaén,Dept. of Computer Science
来源
Knowledge and Information Systems | 2012年 / 32卷
关键词
Approximate models; Classification; Imputation; Rule induction learning; Lazy learning; Missing values; Single imputation;
D O I
暂无
中图分类号
学科分类号
摘要
In real-life data, information is frequently lost in data mining, caused by the presence of missing values in attributes. Several schemes have been studied to overcome the drawbacks produced by missing values in data mining tasks; one of the most well known is based on preprocessing, formerly known as imputation. In this work, we focus on a classification task with twenty-three classification methods and fourteen different imputation approaches to missing values treatment that are presented and analyzed. The analysis involves a group-based approach, in which we distinguish between three different categories of classification methods. Each category behaves differently, and the evidence obtained shows that the use of determined missing values imputation methods could improve the accuracy obtained for these methods. In this study, the convenience of using imputation methods for preprocessing data sets with missing values is stated. The analysis suggests that the use of particular imputation methods conditioned to the groups is required.
引用
收藏
页码:77 / 108
页数:31
相关论文
共 106 条
[1]  
Alcalá-fdez J(2009)Keel: a software tool to assess evolutionary algorithms for data mining problems Soft Comput 13 307-318
[2]  
Sánchez L(1997)Locally weighted learning Artif Intell Rev 11 11-73
[3]  
García S(1999)Applications of multiple imputation in medical studies: From aids to nhanes Stat Methods Med Res 8 17-36
[4]  
Jesus MJD(2003)An analysis of four missing data treatment methods for supervised learning Appl Artif Intell 17 519-533
[5]  
Ventura S(2001)Nearest prototype classifier designs: an experimental study Int J Intell Syst 16 1445-1473
[6]  
Garrell JM(1988)Multivariable functional interpolation and adaptive networks Complex Syst 11 321-355
[7]  
Otero J(1989)The cn2 induction algorithm Mach Learn J 3 261-283
[8]  
Bacardit J(2006)Statistical comparisons of classifiers over multiple data sets J Mach Learn Res 7 1-30
[9]  
Rivas VM(2010)An investigation of missing data methods for classification trees applied to binary response data J Mach Learn Res 11 131-170
[10]  
Fernández JC(1997)On the optimality of the simple bayesian classifier under zero-one loss Mach Learn 29 103-137