Missing Data in Clinical Research: A Tutorial on Multiple Imputation

被引:562
作者
Austin, Peter C. [1 ,2 ,3 ]
White, Ian R. [4 ]
Lee, Douglas S. [1 ,2 ,5 ,6 ,7 ]
van Buuren, Stef [8 ,9 ]
机构
[1] Inst Clin Evaluat Sci, G106,2075 Bayview Ave, Toronto, ON M4N 3M5, Canada
[2] Univ Toronto, Inst Hlth Policy Management & Evaluat, Toronto, ON, Canada
[3] Sunnybrook Res Inst, Toronto, ON, Canada
[4] UCL, Med Res Council, Clin Trials Unit, London, England
[5] Univ Toronto, Dept Med, Toronto, ON, Canada
[6] Univ Hlth Network, Toronto, ON, Canada
[7] Peter Munk Cardiac Ctr, Toronto, ON, Canada
[8] Univ Utrecht, Utrecht, Netherlands
[9] Netherlands Org Appl Sci Res, Leiden, Netherlands
基金
加拿大健康研究院; 英国医学研究理事会;
关键词
VALIDATION; MORTALITY;
D O I
10.1016/j.cjca.2020.11.010
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Missing data is a common occurrence in clinical research. Missing data occurs when the value of the variables of interest are not measured or recorded for all subjects in the sample. Common approaches to addressing the presence of missing data include complete-case analyses, where subjects with missing data are excluded, and mean-value imputation, where missing values are replaced with the mean value of that variable in those subjects for whom it is not missing. However, in many settings, these approaches can lead to biased estimates of statistics (eg, of regression coefficients) and/or confidence intervals that are artificially narrow. Multiple imputation (MI) is a popular approach for addressing the presence of missing data. With MI, multiple plausible values of a given variable are imputed or filled in for each subject who has missing data for that variable. This results in the creation of multiple completed data sets. Identical statistical analyses are conducted in each of these complete data sets and the results are pooled across complete data sets. We provide an introduction to MI and discuss issues in its implementation, including developing the imputation model, how many imputed data sets to create, and addressing derived variables. We illustrate the application of MI through an analysis of data on patients hospitalised with heart failure. We focus on developing a model to estimate the probability of 1-year mortality in the presence of missing data. Statistical software code for conducting MI in R, SAS, and Stata are provided.
引用
收藏
页码:1322 / 1331
页数:10
相关论文
共 30 条
[1]   Reporting missing participant data in randomised trials: systematic survey of the methodological literature and a proposed guide [J].
Akl, Elie A. ;
Shawwa, Khaled ;
Kahale, Lara A. ;
Agoritsas, Thomas ;
Brignardello-Petersen, Romina ;
Busse, Jason W. ;
Carrasco-Labra, Alonso ;
Ebrahim, Shanil ;
Johnston, Bradley C. ;
Neumann, Ignacio ;
Sola, Ivan ;
Sun, Xin ;
Vandvik, Per ;
Zhang, Yuqing ;
Alonso-Coello, Pablo ;
Guyatt, Gordon H. .
BMJ OPEN, 2015, 5 (12)
[2]  
[Anonymous], 1987, Multiple Imputations for Non Response in Surveys
[3]  
[Anonymous], 2007, 5 C ART INT APPL ENV
[4]   Multiple Imputation for Multilevel Data with Continuous and Binary Variables [J].
Audigier, Vincent ;
White, Ian R. ;
Jolani, Shahab ;
Debray, Thomas P. A. ;
Quartagno, Matteo ;
Carpenter, James ;
van Buuren, Stef ;
Resche-Rigon, Matthieu .
STATISTICAL SCIENCE, 2018, 33 (02) :160-183
[5]  
Carpenter JR, 2013, Multiple imputation and its application: Carpenter/multiple imputation and its application, DOI DOI 10.1002/9781119942283
[6]   Missing covariate data in clinical research: when and when not to use the missing-indicator method for analysis [J].
Groenwold, Rolf H. H. ;
White, Ian R. ;
Donders, Rogier T. ;
Carpenter, James R. ;
Altman, Douglas G. ;
Moons, Karel G. M. .
CANADIAN MEDICAL ASSOCIATION JOURNAL, 2012, 184 (11) :1265-1269
[7]   Prospective Validation of the Emergency Heart Failure Mortality Risk Grade for Acute Heart Failure The ACUTE Study [J].
Lee, Douglas S. ;
Lee, Jacques S. ;
Schull, Michael J. ;
Borgundvaag, Bjug ;
Edmonds, Marcia L. ;
Ivankovic, Maria ;
McLeod, Shelley L. ;
Dreyer, Jonathan F. ;
Sabbah, Sam ;
Levy, Phillip D. ;
O'Neill, Tara ;
Chong, Alice ;
Stukel, Therese A. ;
Austin, Peter C. ;
Tu, Jack V. .
CIRCULATION, 2019, 139 (09) :1146-1156
[8]   Predicting mortality among patients hospitalized for heart failure - Derivation and validation of a clinical model [J].
Lee, DS ;
Austin, PC ;
Rouleau, JL ;
Liu, PP ;
Naimark, D ;
Tu, JV .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2003, 290 (19) :2581-2587
[9]  
Little RJA, 2002, Statistical Analysis With Missing Data, V2nd, DOI [DOI 10.1002/9781119013563, 10.1002/9781119013563]
[10]  
Longford N.T., 2008, Handbook of multilevel analysis, P377