Imputation of missing values is superior to complete case analysis and the missing-indicator method in multivariable diagnostic research: A clinical example

被引:450
作者
van der Heijden, Geert J. M. G.
Donders, A. Rogier T.
Stijnen, Theo
Moons, Karel G. M.
机构
[1] Univ Utrecht, Med Ctr, Julius Ctr Hlth Sci & Primary Care, NL-3508 GA Utrecht, Netherlands
[2] Univ Utrecht, Med Ctr, Heart Lung Ctr Utrecht, NL-3508 GA Utrecht, Netherlands
[3] Univ Utrecht, Dept Biostat, NL-3508 GA Utrecht, Netherlands
[4] Univ Utrecht, Copernicus Inst, Dept Innovat Studies, NL-3508 GA Utrecht, Netherlands
[5] Erasmus Univ, Sch Med, Dept Epidemiol & Biostat, NL-3000 DR Rotterdam, Netherlands
关键词
missing data; complete case analysis; single imputation; multiple imputation; indicator method; bias; precision;
D O I
10.1016/j.jclinepi.2006.01.015
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background and Objectives: To illustrate the effects of different methods for handling missing data-complete case analysis, missing-indicator method, single imputation of unconditional and conditional mean, and multiple imputation (Ml)-in the context of multivariable diagnostic research aiming to identify potential predictors (test results) that independently contribute to the prediction of disease presence or absence. Methods: We used data from 398 subjects from a prospective study on the diagnosis of pulmonary embolism. Various diagnostic predictors or tests had (varying percentages of) missing values. Per method of handling these missing values, we fitted a diagnostic prediction model using multivariable logistic regression analysis. Results: The receiver operating characteristic curve area for all diagnostic models was above 0.75. The predictors in the final models based on the complete case analysis, and after using the missing-indicator method, were very different compared to the other models. The models based on MI did not differ much from the models derived after using single conditional and unconditional mean imputation. Conclusion: In multivariable diagnostic research complete case analysis and the use of the missing-indicator method should be avoided, even when data are missing completely at random. MI methods are known to be superior to single imputation methods. For our example study, the single imputation methods performed equally well, but this was most likely because of the low overall number of missing values. (c) 2006 Elsevier Inc. All rights reserved.
引用
收藏
页码:1102 / 1109
页数:8
相关论文
共 38 条
[1]   Developing a prognostic model in the presence of missing data: an ovarian cancer case study [J].
Clark, TG ;
Altman, DG .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2003, 56 (01) :28-37
[2]   A COMPARISON OF ANALYTIC METHODS FOR NONRANDOM MISSINGNESS OF OUTCOME DATA [J].
CRAWFORD, SL ;
TENNSTEDT, SL ;
MCKINLAY, JB .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 1995, 48 (02) :209-219
[3]  
DONDERS AR, 2006, J CLIN EPIDEMIOL
[4]   A critical look at methods for handling missing covariates in epidemiologic regression analyses [J].
Greenland, S ;
Finkle, WD .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 1995, 142 (12) :1255-1264
[5]   THE MEANING AND USE OF THE AREA UNDER A RECEIVER OPERATING CHARACTERISTIC (ROC) CURVE [J].
HANLEY, JA ;
MCNEIL, BJ .
RADIOLOGY, 1982, 143 (01) :29-36
[6]  
Harrell FE, 1996, STAT MED, V15, P361, DOI 10.1002/(SICI)1097-0258(19960229)15:4<361::AID-SIM168>3.0.CO
[7]  
2-4
[8]  
HARRELL FE, 2001, REGRESSION MODELING
[9]  
Little R.J., 1987, Statistical Analysis With Missing Data
[10]  
LITTLE RJA, 1992, J AM STAT ASSOC, V87, P1255