A Noise-Aware Multiple Imputation Algorithm for Missing Data

被引:3
作者
Li, Fangfang [1 ]
Sun, Hui [1 ]
Gu, Yu [1 ]
Yu, Ge [1 ]
机构
[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang 110169, Peoples R China
关键词
noise-aware; missing data; multiple imputation; regression prediction; Markov chain; MODELS;
D O I
10.3390/math11010073
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
Missing data is a common and inevitable phenomenon. In practical applications, the datasets usually contain noises for various reasons. Most of the existing missing data imputing algorithms are affected by noises which reduce the accuracy of the imputation. This paper proposes a noise-aware missing data multiple imputation algorithm NPMI in static data. Different multiple imputation models are proposed according to the missing mechanism of data. Secondly, the method to determine the imputation order of multivariablesmissing is given. A random sampling consistency algorithm is proposed to estimate the initial values of the parameters of the multiple imputation model to reduce the influence of noise data and improve the algorithm's robustness. Experiments on two real datasets and two synthetic datasets verify the accuracy and efficiency of the proposed NPMI algorithm, and the results are analyzed.
引用
收藏
页数:16
相关论文
共 36 条
[1]   Influence of imputation and EM methods on factor analysis when item nonresponse in questionnaire data is nonignorable [J].
Bernaards, CA ;
Sijtsma, K .
MULTIVARIATE BEHAVIORAL RESEARCH, 2000, 35 (03) :321-364
[2]   DATA GAPS, DATA INCOMPARABILITY, AND DATA IMPUTATION: A REVIEW OF POVERTY MEASUREMENT METHODS FOR DATA-SCARCE ENVIRONMENTS [J].
Dang, Hai-Anh ;
Jolliffe, Dean ;
Carletto, Calogero .
JOURNAL OF ECONOMIC SURVEYS, 2019, 33 (03) :757-797
[3]   On field calibration of an electronic nose for benzene estimation in an urban pollution monitoring scenario [J].
De Vito, S. ;
Massera, E. ;
Piga, A. ;
Martinotto, L. ;
Di Francia, G. .
SENSORS AND ACTUATORS B-CHEMICAL, 2008, 129 (02) :750-757
[4]   Global Detection of Complex Copying Relationships Between Sources [J].
Dong, Xin Luna ;
Berti-Equille, Laure ;
Hu, Yifan ;
Srivastava, Divesh .
PROCEEDINGS OF THE VLDB ENDOWMENT, 2010, 3 (01) :1358-1369
[5]   A genetic algorithm-based approach for building accurate decision trees [J].
Fu, ZW ;
Golden, BL ;
Lele, S ;
Raghavan, S ;
Wasil, EA .
INFORMS JOURNAL ON COMPUTING, 2003, 15 (01) :3-22
[6]  
Guru D. S., 2017, International Journal of Computer Vision and Image Processing, V7, P64, DOI 10.4018/IJCVIP.2017040105
[7]   Proposing suitable data imputation methods by adopting a Stage wise approach for various classes of smart meters missing data-Practical approach [J].
Hemanth, G. R. ;
Raja, Charles S. .
EXPERT SYSTEMS WITH APPLICATIONS, 2022, 187
[8]   A Bayesian imputation method for a clustering genetic algorithm [J].
Hruschka, Estevam R. ;
Hruschka, Eduardo R. ;
Ebecken, Nelson F. F. .
JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2011, 11 (04) :173-183
[9]   Bayesian networks for imputation in classification problems [J].
Hruschka, Estevam R., Jr. ;
Hruschka, Eduardo R. ;
Ebecken, Nelson F. F. .
JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2007, 29 (03) :231-252
[10]   Missing Data Imputation Based on Low-Rank Recovery and Semi-Supervised Regression for Software Effort Estimation [J].
Jing, Xiao-Yuan ;
Qi, Fumin ;
Wu, Fei ;
Xu, Baowen .
2016 IEEE/ACM 38TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), 2016, :607-618