How handling missing data may impact conclusions: A comparison of six different imputation methods for categorical questionnaire data

被引:58
作者
Stayseth, Marianne Riksheim [1 ]
Clausen, Thomas [1 ]
Roislien, Jo [1 ,2 ]
机构
[1] Univ Oslo, Norwegian Ctr Addict Res, Inst Clin Med, N-0315 Oslo, Norway
[2] Univ Stavanger, Fac Hlth Sci, Stavanger, Norway
关键词
Missing data; categorical data; multiple imputation; hot deck imputation; multiple correspondence analysis; complete case analysis; random forests; latent class analysis; HOT DECK IMPUTATION; MULTIPLE-IMPUTATION; MAINTENANCE TREATMENT; INCOMPLETE-DATA; METHADONE; REGRESSION; DISCRETE; VALUES; BIAS;
D O I
10.1177/2050312118822912
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Objectives: Missing data is a recurrent issue in many fields of medical research, particularly in questionnaires. The aim of this article is to describe and compare six conceptually different multiple imputation methods, alongside the commonly used complete case analysis, and to explore whether the choice of methodology for handling missing data might impact clinical conclusions drawn from a regression model when data are categorical. Methods: In addition to the commonly used complete case analysis, we tested the following six imputation methods: multiple imputation using expectation-maximization with bootstrapping, multiple imputation using multiple correspondence analysis, multiple imputation using latent class analysis, multiple hot deck imputation and multivariate imputation by chained equations with two different model specifications: logistic regression and random forests. The methods are tested on real data from a questionnaire-based study in the Norwegian opioid maintenance treatment programme. Results: All methods performed relatively well when the sample size was large (n = 1000). For a smaller sample size (n = 200), the regression estimates depend heavily on the level of missing. When the amount of missing was > 20%, in particular, complete case analysis, hot deck and random forests had biased estimates with too low coverage. Multiple imputation using multiple correspondence analysis had the best performance all over. Conclusion: The choice of missing handling methodology has a significant impact on the clinical interpretation of the accompanying statistical analyses. With missing data, the choice of whether to impute or not, and choice of imputation method, can influence clinical conclusion drawn from a regression model and should therefore be given sufficient consideration.
引用
收藏
页数:12
相关论文
共 57 条
[41]  
RUBIN DB, 1976, BIOMETRIKA, V63, P581, DOI 10.1093/biomet/63.3.581
[42]   Missing data: Our view of the state of the art [J].
Schafer, JL ;
Graham, JW .
PSYCHOLOGICAL METHODS, 2002, 7 (02) :147-177
[43]   Multiple imputation for multivariate missing-data problems: A data analyst's perspective [J].
Schafer, JL ;
Olsen, MK .
MULTIVARIATE BEHAVIORAL RESEARCH, 1998, 33 (04) :545-571
[44]  
Schafer JL, 1997, Analysis of incomplete multivariate data
[45]   Comparison of Random Forest and Parametric Imputation Models for Imputing Missing Data Using MICE: A CALIBER Study [J].
Shah, Anoop D. ;
Bartlett, Jonathan W. ;
Carpenter, James ;
Nicholas, Owen ;
Hemingway, Harry .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 2014, 179 (06) :764-774
[46]   Nonparametric Bayesian Multiple Imputation for Incomplete Categorical Variables in Large-Scale Assessment Surveys [J].
Si, Yajuan ;
Reiter, Jerome P. .
JOURNAL OF EDUCATIONAL AND BEHAVIORAL STATISTICS, 2013, 38 (05) :499-521
[47]   Factors associated with ongoing criminal engagement while in opioid maintenance treatment [J].
Stavseth, Marianne Riksheim ;
Roislien, Jo ;
Bukten, Anne ;
Clausen, Thomas .
JOURNAL OF SUBSTANCE ABUSE TREATMENT, 2017, 77 :52-56
[48]  
Stouffer S.A., 1950, Measurement and Prediction, P362
[49]   Fully conditional specification in multivariate imputation [J].
Van Buuren, S. ;
Brand, J. P. L. ;
Groothuis-Oudshoorn, C. G. M. ;
Rubin, D. B. .
JOURNAL OF STATISTICAL COMPUTATION AND SIMULATION, 2006, 76 (12) :1049-1064
[50]  
Van Buuren S., 2018, FLEXIBLE IMPUTATION, DOI DOI 10.1201/B11826