A novel framework for imputation of missing values in databases

被引:156
作者
Farhangfar, Alireza
Kurgan, Lukasz A.
Pedrycz, Witold
机构
[1] Department of Electrical and Computer Engineering, University of Alberta, Edmonton
来源
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS | 2007年 / 37卷 / 05期
基金
加拿大自然科学与工程研究理事会;
关键词
accuracy; databases; missing values; multiple imputation (MI); single imputation;
D O I
10.1109/TSMCA.2007.902631
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Many of the industrial and research. databases are plagued by the problem of missing values. Some evident examples include databases associated with instrument maintenance, medical applications, and surveys. One of the common ways to cope with missing values is to complete their imputation (filling in). Given the rapid growth of sizes of databases, it becomes imperative to come up with a new imputation methodology along with efficient algorithms. The main objective of this paper is to develop a unified framework supporting a host. of imputation methods. In the development of this framework, we require that its usage should (on average) lead to the significant improvement of accuracy of imputation while maintaining the same asymptotic computational complexity of the individual methods. Our intent is to provide a comprehensive review of the representative imputation techniques. It is noticeable that the use of the framework in the case of a low-quality single-imputation method has resulted in the imputation accuracy that is comparable to the one achieved when dealing with some other advanced imputation techniques. We also demonstrate, both theoretically and experimentally, that the application of the proposed framework leads to a linear computational complexity and, therefore, does not affect the asymptotic complexity of the associated imputation method.
引用
收藏
页码:692 / 709
页数:18
相关论文
共 59 条
[1]  
Acuña E, 2004, ST CLASS DAT ANAL, P639
[2]  
ALZOLA C, 1999, INTRO S PLUS HMISC D
[3]  
[Anonymous], P EUR C MACH LEARN E
[4]  
[Anonymous], 1997, Analysis of incomplete multivariate data
[5]  
[Anonymous], [No title captured]
[6]  
[Anonymous], 1998, DATA MINING METHODS
[7]   Applications of multiple imputation in medical studies: from AIDS as NHANES [J].
Barnard, J ;
Meng, XL .
STATISTICAL METHODS IN MEDICAL RESEARCH, 1999, 8 (01) :17-36
[8]  
Batista GEAPA, 2003, APPL ARTIF INTELL, V17, P519, DOI [10.1080/713827181, 10.1080/08839510390219309]
[9]  
BLAKE JPL, 1998, UCI REPOSITORY MACHI
[10]  
Brand JPL., 1999, DEV IMPLEMENTATION E