Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

被引:7
作者
van Kuijk, Sander M. J. [1 ,2 ]
Viechtbauer, Wolfgang [3 ]
Peeters, Louis L. [4 ]
Smits, Luc [2 ]
机构
[1] Maastricht Univ, Clin Epidemiol & Med Technol Assessment, Med Ctr, NL-6200 MD Maastricht, Netherlands
[2] Maastricht Univ, Epidemiol, POB 616, NL-6200 MD Maastricht, Netherlands
[3] Maastricht Univ, Stat & Methodol, POB 616, NL-6200 MD Maastricht, Netherlands
[4] Maastricht Univ, Med Ctr, Obstet & Gynecol, POB 5800, NL-6202 AZ Maastricht, Netherlands
关键词
multiple imputation; complete case analysis; missing data; bias; regression;
D O I
10.2427/11598
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background: The purpose of this simulation study is to compare bias in the estimation of regression coefficients between multiple imputation (MI) and complete case (CC) analysis when assumptions of missing data mechanisms are violated. Methods: The authors performed a stochastic simulation study in which data were drawn from a multivariate normal distribution, and missing values were created according to different missing data mechanisms (missing completely at random (MCAR), at random (MAR), and not at random (MNAR)). Data were analysed with a linear regression model using CC analysis, and after MI. In addition, characteristics of the data (i.e. correlation, size of the regression coefficients, error variance, proportion of missing data) were varied to assess the influence on the size and sign of bias. n Y, CC analysis resulted in severely biased regression coefficients; the Results: When data were MAR conditional oy were consistently underestimated in our scenarios. In the same scenarios, analysis after MI gave correct estimates. Yet, in case of MNAR MI yielded biased regression coefficients, while CC analysis did not result in biased estimates, contrary to expectation. Conclusion: The authors demonstrated that MI was only superior to CC analysis in case of MCAR or MAR, with respect to bias and precision. In some scenarios CC may be superior to MI. Often it is not feasible to identify the cause of incomplete data in a given dataset. Therefore, emphasis should be placed on reporting the extent of missing values, the method that was used to address the problem, and the assumptions that were made about the mechanism that caused missing data.
引用
收藏
页数:8
相关论文
共 13 条
[1]  
Allison P. D., 2001, MISSING DATA
[2]   Review: A gentle introduction to imputation of missing values [J].
Donders, A. Rogier T. ;
van der Heijden, Geert J. M. G. ;
Stijnen, Theo ;
Moons, Karel G. M. .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2006, 59 (10) :1087-1091
[3]  
Fernandes-Taylor S, 2011, BMC RES NOTES, V4, P304, DOI DOI 10.1186/1756-0500-4-304
[4]  
Harrell FE., 2001, SPRINGER SER STAT, DOI DOI 10.1007/978-3-319-19425-7
[5]   Dealing with Missing Predictor Values When Applying Clinical Prediction Models [J].
Janssen, Kristel J. M. ;
Vergouwe, Yvonne ;
Donders, A. Rogier T. ;
Harrell, Frank E., Jr. ;
Chen, Qingxia ;
Grobbee, Diederick E. ;
Moons, Karel G. M. .
CLINICAL CHEMISTRY, 2009, 55 (05) :994-1001
[6]  
R Development Core Team, 2008, R LANG ENV STAT COMP
[7]  
RUBIN DB, 1976, BIOMETRIKA, V63, P581, DOI 10.1093/biomet/63.3.581
[8]  
Schafer J.L., 1997, ANAL INCOMPLETE MULT
[9]   Multiple imputation: a primer [J].
Schafer, JL .
STATISTICAL METHODS IN MEDICAL RESEARCH, 1999, 8 (01) :3-15
[10]  
Steyerberg EW., 2009, CLIN PREDICTION MODE, DOI [10.1007/978-0-387-77244-8, DOI 10.1007/978-0-387-77244-8]