Using the outcome for imputation of missing predictor values was preferred

被引:748
作者
Moons, Karel G. M.
Donders, Rogier A. R. T.
Stijnen, Theo
Harrell, Frank E., Jr.
机构
[1] Univ Utrecht, Med Ctr, Julius Ctr Hlth Sci & Gen Practice, NL-3508 GA Utrecht, Netherlands
[2] Univ Utrecht, Copernicus Inst, Dept Innovat Studies, NL-3508 GA Utrecht, Netherlands
[3] Erasmus Univ, Med Ctr, Dept Epidemiol & Biostat, NL-3000 DR Rotterdam, Netherlands
[4] Vanderbilt Univ, Med Ctr, Dept Biostat, Nashville, TN 37232 USA
关键词
bias; imputation; missing predictors; precision; prediction;
D O I
10.1016/j.jclinepi.2006.01.009
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background and Objective: Epidemiologic studies commonly estimate associations between predictors (risk factors) and outcome. Most software automatically exclude subjects with missing values. This commonly causes bias because missing values seldom occur completely at random (MCAR) but rather selectively based on other (observed) variables, missing at random (MAR). Multiple imputation (MI) of missing predictor values using all observed information including outcome is advocated to deal with selective missing values. This seems a self-fulfilling prophecy. Methods: We tested this hypothesis using data from a study on diagnosis of pulmonary embolism. We selected five predictors of pulmonary embolism without missing values. Their regression coefficients and standard errors (SEs) estimated from the original sample were considered as "true" values. We assigned missing values to these predictors-both MCAR and MAR-and repeated this 1,000 times using simulations. Per simulation we multiple imputed the missing values without and with the outcome, and compared the regression coefficients and SEs to the truth. Results: Regression coefficients based on MI including outcome were close to the truth. MI without outcome yielded very biased-underestimated-coefficients. SEs and coverage of the 90% confidence intervals were not different between MI with and without outcome. Results were the same for MCAR and MAR. Conclusion: For all types of missing values, imputation of missing predictor values using the outcome is preferred over imputation without outcome and is no self-fulfilling prophecy. (c) 2006 Elsevier Inc. All rights reserved.
引用
收藏
页码:1092 / 1101
页数:10
相关论文
共 38 条
[21]   Redundancy of single diagnostic test evaluation [J].
Moons, KGM ;
van Es, GA ;
Michel, BC ;
Büller, HR ;
Habbema, JDF ;
Grobbee, DE .
EPIDEMIOLOGY, 1999, 10 (03) :276-281
[22]   Diagnostic research on routine care data prospects and problems [J].
Oostenbrink, R ;
Moons, KGM ;
Bleeker, SE ;
Moll, HA ;
Grobbee, DE .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2003, 56 (06) :501-506
[23]   Prediction of bacterial meningitis in children with meningeal signs: reduction of lumbar punctures [J].
Oostenbrink, R ;
Moons, KGM ;
Donders, ART ;
Grobbee, DE ;
Moll, HA .
ACTA PAEDIATRICA, 2001, 90 (06) :611-617
[24]   ESTIMATION OF REGRESSION-COEFFICIENTS WHEN SOME REGRESSORS ARE NOT ALWAYS OBSERVED [J].
ROBINS, JM ;
ROTNITZKY, A ;
ZHAO, LP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1994, 89 (427) :846-866
[25]   Multiple imputation after 18+ years [J].
Rubin, DB .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1996, 91 (434) :473-489
[26]  
RUBIN DB, 1976, BIOMETRIKA, V63, P581, DOI 10.1093/biomet/63.3.581
[27]   MULTIPLE IMPUTATION IN HEALTH-CARE DATABASES - AN OVERVIEW AND SOME APPLICATIONS [J].
RUBIN, DB ;
SCHENKER, N .
STATISTICS IN MEDICINE, 1991, 10 (04) :585-598
[28]  
Rubin DonaldB., 1987, MULTIPLE IMPUTATIONS
[29]   Missing data: Our view of the state of the art [J].
Schafer, JL ;
Graham, JW .
PSYCHOLOGICAL METHODS, 2002, 7 (02) :147-177
[30]   Multiple imputation: a primer [J].
Schafer, JL .
STATISTICAL METHODS IN MEDICAL RESEARCH, 1999, 8 (01) :3-15