Multiple imputation of missing at random data: General points and presentation of a Monte-Carlo method

被引:16
作者
Cottrell, G. [1 ]
Cot, M. [2 ]
Mary, J. -Y. [3 ]
机构
[1] Inst Rech Dev, UR010, Cotonou, Benin
[2] Inst Rech Dev, Sante Mere & Enfant Milieu Trop UR010, Paris, France
[3] Univ Paris 07, INSERM, Hop St Louis, U717, Paris, France
来源
REVUE D EPIDEMIOLOGIE ET DE SANTE PUBLIQUE | 2009年 / 57卷 / 05期
关键词
Missing data; Missing at random; Multiple imputation; MCMC; FULLY CONDITIONAL SPECIFICATION; SENSITIVITY-ANALYSIS; INCOMPLETE DATA; BINARY DATA; DROP-OUT; MODELS;
D O I
10.1016/j.respe.2009.04.011
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background. - Statistical analysis of a data set with missing data is a frequent problem to deal with in epidemiology. Methods are available to manage incomplete observations, avoiding biased estimates and improving their precision, compared to more traditional methods, such as the analysis of the sub-sample of complete observations. Methods. - One of these approaches is multiple imputation, which consists in imputing successively several values for each missing data item. Several completed data sets having the same distribution characteristics as the observed data (variability and correlations) are thus generated. Standard analyses are done separately on each completed dataset then combined to obtain a global result. In this paper, we discuss the various assumptions made on the origin of missing data (at random or not), and we present in a pragmatic way the process of multiple imputation. A recent method, Multiple Imputation by Chained Equations (MICE), based on a Monte-Carlo Markov Chain algorithm under missing at random data (MAR) hypothesis, is described. An illustrative example of the MICE method is detailed for the analysis of the relation between a dichotomous variable and two covariates presenting MAR data with no particular structure, through multivariate logistic regression. Results. - Compared with the original dataset without missing data, the results show a substantial improvement of the regression coefficient estimates with the MICE method, relatively to those obtained on the dataset with complete observations. Conclusion. - This method does not require any direct assumption on joint distribution of the variables and it is presently implemented in standard statistical software (Splus, Stata). It can be used for multiple imputation of missing data of several variables with no particular structure. (C) 2009 Elsevier Masson SAS. All rights reserved.
引用
收藏
页码:361 / 372
页数:12
相关论文
共 37 条
[1]   Missing data: a review of current methods and applications in epidemiological research [J].
Abraham, WT ;
Russell, DW .
CURRENT OPINION IN PSYCHIATRY, 2004, 17 (04) :315-321
[2]   TOBIT MODELS - A SURVEY [J].
AMEMIYA, T .
JOURNAL OF ECONOMETRICS, 1984, 24 (1-2) :3-61
[3]  
[Anonymous], 1977, JRSSSB
[4]  
[Anonymous], 1999, 99054 TNOVGZPG
[5]  
[Anonymous], 1987, Multiple Imputation for Nonresponse in Surveys
[6]  
[Anonymous], 1997, Analysis of Incomplete Multivariate Data, DOI [DOI 10.1201/9780367803025, DOI 10.1201/9781439821862]
[7]   Robustness of a multivariate normal approximation for imputation of incomplete binary data [J].
Bernaards, Coen A. ;
Belin, Thomas R. ;
Schafer, Joseph L. .
STATISTICS IN MEDICINE, 2007, 26 (06) :1368-1382
[8]  
BRAND J, 1999, DEV IMPLEMENTATION E, P212
[9]   A toolkit in SAS for the evaluation of multiple imputation methods [J].
Brand, JPL ;
van Buuren, S ;
Groothuis-Oudshoorn, K ;
Gelsema, ES .
STATISTICA NEERLANDICA, 2003, 57 (01) :36-45
[10]   Missing .... presumed at random: cost-analysis of incomplete data [J].
Briggs, A ;
Clark, T ;
Wolstenholme, J ;
Clarke, P .
HEALTH ECONOMICS, 2003, 12 (05) :377-392