Missing value imputation: a review and analysis of the literature (2006–2017)

被引:6
作者
Wei-Chao Lin
Chih-Fong Tsai
机构
[1] Chang Gung University,Department of Information Management
[2] Chang Gung University,Healthy Aging Research Center
[3] Chang Gung Memorial Hospital,Department of Thoracic Surgery
[4] National Central University,Department of Information Management
来源
Artificial Intelligence Review | 2020年 / 53卷
关键词
Missing values; Imputation; Supervised learning; Incomplete dataset; Data mining;
D O I
暂无
中图分类号
学科分类号
摘要
Missing value imputation (MVI) has been studied for several decades being the basic solution method for incomplete dataset problems, specifically those where some data samples contain one or more missing attribute values. This paper aims at reviewing and analyzing related studies carried out in recent decades, from the experimental design perspective. Altogether, 111 journal papers published from 2006 to 2017 are reviewed and analyzed. In addition, several technical issues encountered during the MVI process are addressed, such as the choice of datasets, missing rates and missingness mechanisms, and the MVI techniques and evaluation metrics employed, are discussed. The results of analysis of these issues allow limitations in the existing body of literature to be identified based upon which some directions for future research can be gleaned.
引用
收藏
页码:1487 / 1509
页数:22
相关论文
共 388 条
[81]  
Kurgan LA(2007)Hybrid prediction model with missing value imputation for medical data Appl Intell 27 79-2804
[82]  
Dy J(2009)Semi-parametric optimization for missing data imputation Expert Syst Appl 36 2794-65
[83]  
Folino G(2013)POP algorithm: kernel-based imputation to treat missing values in knowledge discovery from databases Knowl Based Syst 53 51-26
[84]  
Pisani FS(2013)Missing value imputation using decision trees and decision forests by splittling and merging records: two novel techniques Adv Bioinform 2013 790567-1657
[85]  
Fortes I(1987)Comparing imputation procedures for affymetrix gene expression datasets using MAQC datasets Educ Psychol Meas 47 13-206
[86]  
Mora-Lopez L(2007)A comparison of methods for treating incomplete data in selection research J Mach Learn Res 8 1625-514
[87]  
Morales R(2017)Handling missing values when applying classification models Knowl Inf Syst 53 179-774
[88]  
Triguero F(2008)Effective sparse imputation of patient conditions in electronic medical records for emergency risk predictions J Biomed Inform 41 499-123
[89]  
Gan X(2009)Ameliorative missing value imputation for robust biological knowledge inference EURASIP J Bioinform Syst Biol 2009 717136-129
[90]  
Liew AW-C(2014)How to improve postgenomic knowledge discovery using imputation Am J Epidemiol 179 764-74