MIMCA: multiple imputation for categorical variables with multiple correspondence analysis

被引:32
作者
Audigier, Vincent [1 ]
Husson, Francois [1 ]
Josse, Julie [1 ]
机构
[1] Agrocampus Quest, Appl Math Dept, 65 rue St Brieuc, F-35042 Rennes, France
关键词
Missing values; Categorical data; Multiple imputation; Multiple correspondence analysis; Bootstrap; MAXIMUM-LIKELIHOOD; APPROXIMATION; MATRIX; MODELS; MICE; EM;
D O I
10.1007/s11222-016-9635-4
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
We propose a multiple imputation method to deal with incomplete categorical data. This method imputes the missing entries using the principal component method dedicated to categorical data: multiple correspondence analysis (MCA). The uncertainty concerning the parameters of the imputation model is reflected using a non-parametric bootstrap. Multiple imputation using MCA (MIMCA) requires estimating a small number of parameters due to the dimensionality reduction property of MCA. It allows the user to impute a large range of data sets. In particular, a high number of categories per variable, a high number of variables or a small number of individuals are not an issue for MIMCA. Through a simulation study based on real data sets, the method is assessed and compared to the reference methods (multiple imputation using the loglinear model, multiple imputation by logistic regressions) as well to the latest works on the topic (multiple imputation by random forests or by the Dirichlet process mixture of products of multinomial distributions model). The proposed method provides a good point estimate of the parameters of the analysis model considered, such as the coefficients of a main effects logistic regression model, and a reliable estimate of the variability of the estimators. In addition, MIMCA has the great advantage that it is substantially less time consuming on data sets of high dimensions than the other multiple imputation methods.
引用
收藏
页码:501 / 518
页数:18
相关论文
共 70 条
[1]   Approximate is better than "exact" for interval estimation of binomial proportions [J].
Agresti, A ;
Coull, BA .
AMERICAN STATISTICIAN, 1998, 52 (02) :119-126
[2]  
Agresti A., 2012, Categorical Data Analysis, V3, DOI DOI 10.1002/0471249688
[3]  
ALBERT A, 1984, BIOMETRIKA, V71, P1
[4]  
Allison P. D., 2012, SAS Global Forum, V2012, P1038
[5]  
Allison PD, 2010, HANDBOOK OF SURVEY RESEARCH, 2ND EDITION, P631
[6]  
[Anonymous], STAT COMPUT
[7]  
[Anonymous], MI MISSING DATA IMPU
[8]  
[Anonymous], 1984, Theory and Application of Correspondence Analysis
[9]  
[Anonymous], 1973, L'analyse des donnees
[10]  
Applied Mathematics Department Agrocampus O France, 2010, GAL DAT SET