Missing value imputation: a review and analysis of the literature (2006–2017)

被引:6
作者
Wei-Chao Lin
Chih-Fong Tsai
机构
[1] Chang Gung University,Department of Information Management
[2] Chang Gung University,Healthy Aging Research Center
[3] Chang Gung Memorial Hospital,Department of Thoracic Surgery
[4] National Central University,Department of Information Management
来源
Artificial Intelligence Review | 2020年 / 53卷
关键词
Missing values; Imputation; Supervised learning; Incomplete dataset; Data mining;
D O I
暂无
中图分类号
学科分类号
摘要
Missing value imputation (MVI) has been studied for several decades being the basic solution method for incomplete dataset problems, specifically those where some data samples contain one or more missing attribute values. This paper aims at reviewing and analyzing related studies carried out in recent decades, from the experimental design perspective. Altogether, 111 journal papers published from 2006 to 2017 are reviewed and analyzed. In addition, several technical issues encountered during the MVI process are addressed, such as the choice of datasets, missing rates and missingness mechanisms, and the MVI techniques and evaluation metrics employed, are discussed. The results of analysis of these issues allow limitations in the existing body of literature to be identified based upon which some directions for future research can be gleaned.
引用
收藏
页码:1487 / 1509
页数:22
相关论文
共 388 条
[11]  
Aydilek IB(2017)Ensemble correlation-based low-rank matrix completion with applications to traffic data imputation Knowl Based Syst 132 249-262
[12]  
Arslan A(2012)Iterative bicluster-based least square framework for estimation of missing values in microarray gene expression data Pattern Recogn 45 1281-1289
[13]  
Baraldi AN(2013)Missing value imputation for microarray data: a comprehensive comparison study and a web tool BMC Syst Biol 7 S12-384
[14]  
Enders CK(2014)Mining incomplete data with singleton, subset and concept probabilistic approximations Inf Sci 280 368-463
[15]  
Bras LP(2016)A dynamic ensemble approach to robust classification in the presence of missing data Mach Learn 102 443-160
[16]  
Menezes JC(2001)Reducing missing data in surveys: an overview of methods Qual Quant 35 147-72
[17]  
Brock GN(2015)Impact of missing data imputation methods on gene expression clustering and classification Bioinformatics 16 64-6797
[18]  
Shaffer JR(2011)Missing data analysis with fuzzy c-means: a study of its application in a psychological scenario Expert Syst Appl 38 6793-5316
[19]  
Blakesley RE(2007)Imputation through finite Gaussian mixture models Comput Stat Data Anal 51 5305-933
[20]  
Lotz MJ(2012)A comparison of imputation methods for handling missing scores in biometric fusion Pattern Recogn 45 919-170