A simulation comparison of imputation methods for quantitative data in the presence of multiple data patterns

被引:10
作者
Solaro, N. [1 ]
Barbiero, A. [2 ]
Manzi, G. [2 ]
Ferrari, P. A. [2 ]
机构
[1] Univ Milano Bicocca, Dept Stat & Quantitat Methods, Via Bicocca Arcimboldi 8, I-20126 Milan, Italy
[2] Univ Milan, Dept Econ Management & Quantitat Methods, Milan, Italy
关键词
Forward imputation; iterative principal component analysis; Mahalanobis distance; missForest; missing data; Monte Carlo simulation; multivariate exponential power distribution; multivariate skew-normal distribution; nearest-neighbour imputation; MISSING DATA; MULTIVARIATE;
D O I
10.1080/00949655.2018.1530773
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
An extensive investigation via simulation is carried out with the aim of comparing three nonparametric, single imputation methods in the presence of multiple data patterns. The ultimate goal is to provide useful hints for users needing to quickly pick the most effective imputation method among the following: Forward Imputation (Forlmp), considered in the two variants of with the principal component analysis (PCA), which alternates the use of PCA and the Nearest-Neighbour Imputation (NNI) method in a forward, sequential procedure, and with the Mahalanobis distance, which involves the use of the Mahalanobis distance when performing NNI; the iterative PCA technique, which imputes missing values simultaneously via PCA; the method, which is based on random forests and is developed for mixed-type data. The performance of these methods is compared under several data patterns characterized by different levels of kurtosis or skewness and correlation structures.
引用
收藏
页码:3588 / 3619
页数:32
相关论文
共 28 条
[1]  
[Anonymous], THESIS
[2]  
[Anonymous], 2007, Missing Data in Clinical Studies. Statistics in Practice
[3]  
[Anonymous], 1987, Multiple comparison procedures
[4]   Statistical applications of the multivariate skew normal distribution [J].
Azzalini, A ;
Capitanio, A .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1999, 61 :579-602
[5]   The multivariate skew-normal distribution [J].
Azzalini, A ;
DallaValle, A .
BIOMETRIKA, 1996, 83 (04) :715-726
[6]   CHOOSING AMONG IMPUTATION TECHNIQUES FOR INCOMPLETE MULTIVARIATE DATA - A SIMULATION STUDY [J].
BELLO, AL .
COMMUNICATIONS IN STATISTICS-THEORY AND METHODS, 1993, 22 (03) :853-877
[7]   A SIMULATION STUDY OF IMPUTATION TECHNIQUES IN LINEAR, QUADRATIC AND KERNEL DISCRIMINANT ANALYSES [J].
BELLO, AL .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 1993, 48 (3-4) :167-180
[8]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[9]   1977 RIETZ LECTURE - BOOTSTRAP METHODS - ANOTHER LOOK AT THE JACKKNIFE [J].
EFRON, B .
ANNALS OF STATISTICS, 1979, 7 (01) :1-26
[10]   An imputation method for categorical variables with application to nonlinear principal component analysis [J].
Ferrari, Pier Alda ;
Annoni, Paola ;
Barbiero, Alessandro ;
Manzi, Giancarlo .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (07) :2410-2420