Missing value imputation: a review and analysis of the literature (2006–2017)

被引:6
作者
Wei-Chao Lin
Chih-Fong Tsai
机构
[1] Chang Gung University,Department of Information Management
[2] Chang Gung University,Healthy Aging Research Center
[3] Chang Gung Memorial Hospital,Department of Thoracic Surgery
[4] National Central University,Department of Information Management
来源
Artificial Intelligence Review | 2020年 / 53卷
关键词
Missing values; Imputation; Supervised learning; Incomplete dataset; Data mining;
D O I
暂无
中图分类号
学科分类号
摘要
Missing value imputation (MVI) has been studied for several decades being the basic solution method for incomplete dataset problems, specifically those where some data samples contain one or more missing attribute values. This paper aims at reviewing and analyzing related studies carried out in recent decades, from the experimental design perspective. Altogether, 111 journal papers published from 2006 to 2017 are reviewed and analyzed. In addition, several technical issues encountered during the MVI process are addressed, such as the choice of datasets, missing rates and missingness mechanisms, and the MVI techniques and evaluation metrics employed, are discussed. The results of analysis of these issues allow limitations in the existing body of literature to be identified based upon which some directions for future research can be gleaned.
引用
收藏
页码:1487 / 1509
页数:22
相关论文
共 388 条
[1]  
Aittokallio T(2009)Dealing with missing values in large-scale studies: microarray data imputation and beyond Brief Bioinform 11 253-264
[2]  
Armitage EG(2015)Missing value imputation strategies for metabolomics data Electrophoresis 36 3050-3060
[3]  
Godzien J(2010)A conservative feature subset selection algorithm with missing data Neurocomputing 73 585-590
[4]  
Alonso-Herranz V(2012)A novel hybrid approach to estimating missing values in databases using k-nearest neighbors and neural networks Int J Innov Comput Inf Control 8 4705-4717
[5]  
Lopez-Gonzalvez A(2013)A hybrid method for imputation of missing values using optimized fuzzy c-means with support vector regression and a genetic algorithm Inf Sci 233 25-35
[6]  
Barbas C(2010)An introduction to modern missing data analyses J Sch Psychol 48 5-37
[7]  
Aussem A(2007)Improving cluster-based missing value estimation of DNA microarray data Biomol Eng 24 273-282
[8]  
de Morais SR(2008)Which missing value imputation method to use in expression profiles: a comparative study and two selection schemes BMC Bioinform 9 12-23
[9]  
Aydilek IB(2014)Multiple imputation for missing data via sequential regression trees Am J Epidemiol 172 1070-1076
[10]  
Arslan A(2010)Comparative analysis of missing value imputation methods to improve clustering and interpretation of microarray experiments BMC Genom 11 15-30