Multiple Imputation for Multilevel Data with Continuous and Binary Variables

被引:96
作者
Audigier, Vincent [1 ]
White, Ian R. [2 ,3 ]
Jolani, Shahab [4 ]
Debray, Thomas P. A. [5 ]
Quartagno, Matteo [3 ,6 ]
Carpenter, James [3 ,6 ]
van Buuren, Stef
Resche-Rigon, Matthieu
机构
[1] Cedric MSDMA, CNAM, Paris, France
[2] MRC, Biostat Unit, Cambridge Inst Publ Hlth, Cambridge, England
[3] UCL, MRC, Clin Trials Unit, London, England
[4] Maastricht Univ, Sch CAPHRI, Care & Publ Hlth Res Inst, Dept Methodol & Stat, Maastricht, Netherlands
[5] Univ Med Ctr Utrecht, Julius Ctr Hlth Sci & Primary Care, Utrecht, Netherlands
[6] London Sch Hyg & Trop Med, Dept Med Stat, London, England
基金
英国医学研究理事会;
关键词
Missing data; systematically missing values; multilevel data; mixed data; multiple imputation; joint modelling; fully conditional specification; FULLY CONDITIONAL SPECIFICATION; INDIVIDUAL PATIENT DATA; MIXED-EFFECTS MODELS; MISSING-DATA; CHAINED EQUATIONS; IPD METAANALYSIS; HEART-FAILURE; DISTRIBUTIONS; HETEROGENEITY;
D O I
10.1214/18-STS646
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
We present and compare multiple imputation methods for multilevel continuous and binary data where variables are systematically and sporadically missing. The methods are compared from a theoretical point of view and through an extensive simulation study motivated by a real dataset comprising multiple studies. The comparisons show that these multiple imputation methods are the most appropriate to handle missing values in a multilevel setting and why their relative performances can vary according to the missing data pattern, the multilevel structure and the type of missing variables. This study shows that valid inferences can only be obtained if the dataset includes a large number of clusters. In addition, it highlights that heteroscedastic multiple imputation methods provide more accurate inferences than homoscedastic methods, which should be reserved for data with few individuals per cluster. Finally, guidelines are given to choose the most suitable multiple imputation method according to the structure of the data.
引用
收藏
页码:160 / 183
页数:24
相关论文
共 81 条
[21]  
Enders C.K., 2010, APPL MISSING DATA AN
[22]  
Enders C.K., 2017, PSYCHOL METHODS
[23]   Multilevel Multiple Imputation: A Review and Evaluation of Joint Modeling and Chained Equations Imputation [J].
Enders, Craig K. ;
Mistler, Stephen A. ;
Keller, Brian T. .
PSYCHOLOGICAL METHODS, 2016, 21 (02) :222-240
[24]   Dealing with missing covariates in epidemiologic studies: a comparison between multiple imputation and a full Bayesian approach [J].
Erler, Nicole S. ;
Rizopoulos, Dimitris ;
van Rosmalen, Joost ;
Jaddoe, Vincent W. V. ;
Franco, Oscar H. ;
Lesaffre, Emmanuel M. E. H. .
STATISTICS IN MEDICINE, 2016, 35 (17) :2955-2974
[25]   BIAS REDUCTION OF MAXIMUM-LIKELIHOOD-ESTIMATES [J].
FIRTH, D .
BIOMETRIKA, 1993, 80 (01) :27-38
[26]   Prior distributions for variance parameters in hierarchical models(Comment on an Article by Browne and Draper) [J].
Gelman, Andrew .
BAYESIAN ANALYSIS, 2006, 1 (03) :515-533
[27]   STOCHASTIC RELAXATION, GIBBS DISTRIBUTIONS, AND THE BAYESIAN RESTORATION OF IMAGES [J].
GEMAN, S ;
GEMAN, D .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1984, 6 (06) :721-741
[28]  
GLOBAL RESEARCH ACUTE TEAM (GREAT) ON CONDITIONS NETWORK, 2013, MAN AC HEART FAIL ED
[29]   Multilevel structural equation models for the analysis of comparative data on educational performance [J].
Goldstein, Harvey ;
Bonnet, Gerard ;
Rocher, Thierry .
JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2007, 32 (03) :252-286
[30]   Multilevel models with multivariate mixed response types [J].
Goldstein, Harvey ;
Carpenter, James ;
Kenward, Michael G. ;
Levin, Kate A. .
STATISTICAL MODELLING, 2009, 9 (03) :173-197