Multiple imputation of missing at random data: General points and presentation of a Monte-Carlo method

被引：16

作者：

Cottrell, G. ^{[1
]}

Cot, M. ^{[2
]}

Mary, J. -Y. ^{[3
]}

机构：

[1] Inst Rech Dev, UR010, Cotonou, Benin

[2] Inst Rech Dev, Sante Mere & Enfant Milieu Trop UR010, Paris, France

[3] Univ Paris 07, INSERM, Hop St Louis, U717, Paris, France

来源：

REVUE D EPIDEMIOLOGIE ET DE SANTE PUBLIQUE | 2009年 / 57卷 / 05期

关键词：

Missing data; Missing at random; Multiple imputation; MCMC; FULLY CONDITIONAL SPECIFICATION; SENSITIVITY-ANALYSIS; INCOMPLETE DATA; BINARY DATA; DROP-OUT; MODELS;

D O I：

10.1016/j.respe.2009.04.011

中图分类号：

R1 [预防医学、卫生学];

学科分类号：

1004 ; 120402 ;

摘要：

Background. - Statistical analysis of a data set with missing data is a frequent problem to deal with in epidemiology. Methods are available to manage incomplete observations, avoiding biased estimates and improving their precision, compared to more traditional methods, such as the analysis of the sub-sample of complete observations. Methods. - One of these approaches is multiple imputation, which consists in imputing successively several values for each missing data item. Several completed data sets having the same distribution characteristics as the observed data (variability and correlations) are thus generated. Standard analyses are done separately on each completed dataset then combined to obtain a global result. In this paper, we discuss the various assumptions made on the origin of missing data (at random or not), and we present in a pragmatic way the process of multiple imputation. A recent method, Multiple Imputation by Chained Equations (MICE), based on a Monte-Carlo Markov Chain algorithm under missing at random data (MAR) hypothesis, is described. An illustrative example of the MICE method is detailed for the analysis of the relation between a dichotomous variable and two covariates presenting MAR data with no particular structure, through multivariate logistic regression. Results. - Compared with the original dataset without missing data, the results show a substantial improvement of the regression coefficient estimates with the MICE method, relatively to those obtained on the dataset with complete observations. Conclusion. - This method does not require any direct assumption on joint distribution of the variables and it is presently implemented in standard statistical software (Splus, Stata). It can be used for multiple imputation of missing data of several variables with no particular structure. (C) 2009 Elsevier Masson SAS. All rights reserved.

引用

页码：361 / 372

页数：12

共 37 条

[1] Missing data: a review of current methods and applications in epidemiological research [J].