POP algorithm: Kernel-based imputation to treat missing values in knowledge discovery from databases

被引:46
作者
Qin, Yongsong [1 ]
Zhang, Shichao [2 ]
Zhu, Xiaofeng [1 ]
Zhang, Jilian [1 ]
Zhang, Chengqi [2 ]
机构
[1] Guangxi Normal Univ, Sch Comp Sci & Informat Technol, Gui Lin, Guangxi, Peoples R China
[2] Univ Technol Sydney, Fac Informat Technol, Broadway, NSW 2007, Australia
关键词
Knowledge discovery; Missing value; Random regression imputation; Deterministic regression imputation; LIKELIHOOD-BASED INFERENCE; CLASSIFICATION;
D O I
10.1016/j.eswa.2008.01.059
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
To complete missing values a solution is to use correlations between the attributes of the data. The problem is that it is difficult to identify relations within data containing missing values. Accordingly, we develop a kernel-based missing data imputation in this paper. This approach aims at making an optimal inference oil statistical parameters: mean, distribution function and quantile after missing data are imputed. And we refer this approach to parameter optimization method (POP algorithm). We experimentally evaluate our approach, and demonstrate that our POP algorithm (random regression imputation) is much better than deterministic regression imputation in efficiency and generating an inference on the above parameters. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:2794 / 2804
页数:11
相关论文
共 25 条
[1]   Multiple imputation for missing data - A cautionary tale [J].
Allison, PD .
SOCIOLOGICAL METHODS & RESEARCH, 2000, 28 (03) :301-309
[2]  
Batista GEAPA, 2003, APPL ARTIF INTELL, V17, P519, DOI 10.1080/08839510390219309
[3]   Kernel-based methods for hyperspectral image classification [J].
Camps-Valls, G ;
Bruzzone, L .
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2005, 43 (06) :1351-1362
[4]  
Gessert G. H., 1991, SIGMOD Record, V20, P30, DOI 10.1145/126482.126486
[5]   Maximizing the usefulness of data obtained with planned missing value patterns: An application of maximum likelihood procedures [J].
Graham, JW ;
Hofer, SM ;
MacKinnon, DP .
MULTIVARIATE BEHAVIORAL RESEARCH, 1996, 31 (02) :197-218
[6]   What can be done about missing data? Approaches to imputation [J].
Heitjan, DF .
AMERICAN JOURNAL OF PUBLIC HEALTH, 1997, 87 (04) :548-550
[7]   Minimal projective reconstruction including missing data [J].
Kahl, F ;
Heyden, A ;
Quan, L .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2001, 23 (04) :418-424
[8]   Imputation of missing data in industrial databases [J].
Lakshminarayan, K ;
Harp, SA ;
Samad, T .
APPLIED INTELLIGENCE, 1999, 11 (03) :259-275
[9]  
Lakshminarayan K., 1996, Imputation of missing data using machine learning techniques, P140
[10]  
Little R. J. A., 2019, Statistical Analysis with Missing Data, V793