Algorithmic-Type Imputation Techniques with Different Data Structures: Alternative Approaches in Comparison

被引:2
作者
Solaro, Nadia [1 ]
Barbiero, Alessandro [2 ]
Manzi, Giancarlo [2 ]
Ferrari, Pier Alda [2 ]
机构
[1] Univ Milano Bicocca, Dept Stat & Quantitat Methods, Milan, Italy
[2] Univ Milan, Dept Econ Management & Quantitat Methods, Milan, Italy
来源
ANALYSIS AND MODELING OF COMPLEX DATA IN BEHAVIORAL AND SOCIAL SCIENCES | 2014年
关键词
Forward imputation; Iterative PCA; missForest; Missing data;
D O I
10.1007/978-3-319-06692-9__27
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In recent years, with the spread availability of large datasets from multiple sources, increasing attention has been devoted to the treatment of missing information. Recent approaches have paved the way to the development of new powerful algorithmic techniques, in which imputation is performed through computer-intensive procedures. Although most of these approaches are attractive for many reasons, less attention has been paid to the problem of which method should be preferred according to the data structure at hand. This work addresses the problem by comparing the two methods missForest and IPCA with a new method we developed within the forward imputation approach. We carried out comparisons by considering different data patterns with varying skewness and correlation of variables, in order to ascertain in which situations a given method produces more satisfying results.
引用
收藏
页码:253 / 261
页数:9
相关论文
共 12 条
[1]   GRAFT, a complete system for data fusion [J].
Aluja-Banet, Tomas ;
Daunis-i-Estadella, Josep ;
Pellicer, David .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2007, 52 (02) :635-649
[2]  
[Anonymous], THESIS
[3]   Statistical applications of the multivariate skew normal distribution [J].
Azzalini, A ;
Capitanio, A .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 1999, 61 :579-602
[4]   The multivariate skew-normal distribution [J].
Azzalini, A ;
DallaValle, A .
BIOMETRIKA, 1996, 83 (04) :715-726
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   An imputation method for categorical variables with application to nonlinear principal component analysis [J].
Ferrari, Pier Alda ;
Annoni, Paola ;
Barbiero, Alessandro ;
Manzi, Giancarlo .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (07) :2410-2420
[7]  
Greenacre M.J., 1984, Theory and Applications of Correspondence Analysis
[8]   Multiple imputation in principal component analysis [J].
Josse, Julie ;
Pages, Jerome ;
Husson, Francois .
ADVANCES IN DATA ANALYSIS AND CLASSIFICATION, 2011, 5 (03) :231-246
[9]  
Little RJA, 2002, STAT ANAL MISSING DA
[10]  
R Development Core Team, 2012, R LANG ENV STAT COMP