Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers

被引：136

作者：

Blankers, Matthijs ^{[1
,2
]}

Koeter, Maarten W. J. ^{[1
]}

Schippers, Gerard M. ^{[1
,2
]}

机构：

[1] Univ Amsterdam, Acad Med Ctr, AIAR, Dept Psychiat, NL-1100 DD Amsterdam, Netherlands

[2] Arkin Acad, Amsterdam, Netherlands

来源：

JOURNAL OF MEDICAL INTERNET RESEARCH | 2010年 / 12卷 / 05期

关键词：

Missing data; multiple imputation; Internet; methodology; MULTIPLE IMPUTATION; SUBSTANCE USE; ATTRITION; SAMPLE; PRIMER;

D O I：

10.2196/jmir.1448

中图分类号：

R19 [保健组织与事业（卫生事业管理）];

学科分类号：

摘要：

Background: Missing data is a common nuisance in eHealth research: it is hard to prevent and may invalidate research findings. Objective: In this paper several statistical approaches to data "missingness" are discussed and tested in a simulation study. Basic approaches (complete case analysis, mean imputation, and last observation carried forward) and advanced methods (expectation maximization, regression imputation, and multiple imputation) are included in this analysis, and strengths and weaknesses are discussed. Methods: The dataset used for the simulation was obtained from a prospective cohort study following participants in an online self-help program for problem drinkers. It contained 124 nonnormally distributed endpoints, that is, daily alcohol consumption counts of the study respondents. Missingness at random (MAR) was induced in a selected variable for 50% of the cases. Validity, reliability, and coverage of the estimates obtained using the different imputation methods were calculated by performing a bootstrapping simulation study. Results: In the performed simulation study, the use of multiple imputation techniques led to accurate results. Differences were found between the 4 tested multiple imputation programs: NORM, MICE, Amelia II, and SPSS MI. Among the tested approaches, Amelia II outperformed the others, led to the smallest deviation from the reference value (Cohen's d = 0.06), and had the largest coverage percentage of the reference confidence interval (96%). Conclusions: The use of multiple imputation improves the validity of the results when analyzing datasets with missing observations. Some of the often-used approaches (LOCF, complete cases analysis) did not perform well, and, hence, we recommend not using these. Accumulating support for the analysis of multiple imputed datasets is seen in more recent versions of some of the widely used statistical software programs making the use of multiple imputation more readily available to less mathematically inclined researchers.

引用

页码：e54p.1 / e54p.11

页数：11

共 50 条

[11] Managing Missing Data in the Hospital Survey on Patient Safety Culture: A Simulation Study
Boussat, Bastien
Francois, Olivier
Viotti, Julien
Seigneurin, Arnaud
Giai, Joris
Francois, Patrice
Labarere, Jose
JOURNAL OF PATIENT SAFETY, 2021, 17 (02) : E98 - E106
[12] Missing data in craniometrics: a simulation study
Olivier Gauthier
Pierre-Alexandre Landry
François-Joseph Lapointe
Acta Theriologica, 2003, 48 : 25 - 34
[13] Missing data in craniometrics: a simulation study
Gauthier, O
Landry, PA
Lapointe, FJ
ACTA THERIOLOGICA, 2003, 48 (01): : 25 - 34
[14] Handling missing data in randomization tests for single-case experiments: A simulation study
De, Tamal Kumar
Michiels, Bart
Tanious, Rene
Onghena, Patrick
BEHAVIOR RESEARCH METHODS, 2020, 52 (03) : 1355 - 1370
[15] The Effects of Missing Data Handling Methods on Reliability Coefficients: A Monte Carlo Simulation Study
Kacak, Tugay
Kilic, Abdullah Faruk
JOURNAL OF MEASUREMENT AND EVALUATION IN EDUCATION AND PSYCHOLOGY-EPOD, 2024, 15 (02): : 166 - 182
[16] Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study
van Kuijk, Sander M. J.
Viechtbauer, Wolfgang
Peeters, Louis L.
Smits, Luc
EPIDEMIOLOGY BIOSTATISTICS AND PUBLIC HEALTH, 2016, 13 (01)
[17] Accounting for bias due to outcome data missing not at random: comparison and illustration of two approaches to probabilistic bias analysis: a simulation study
Kawabata, Emily
Major-Smith, Daniel
Clayton, Gemma L.
Shapland, Chin Yang
Morris, Tim P.
Carter, Alice R.
Fernandez-Sanles, Alba
Borges, Maria Carolina
Tilling, Kate
Griffith, Gareth J.
Millard, Louise A. C.
Smith, George Davey
Lawlor, Deborah A.
Hughes, Rachael A.
BMC MEDICAL RESEARCH METHODOLOGY, 2024, 24 (01)
[18] Attrition Bias Related to Missing Outcome Data: A Longitudinal Simulation Study
Lewin, Antoine
Brondeel, Ruben
Benmarhnia, Tarik
Thomas, Frederique
Chaix, Basile
EPIDEMIOLOGY, 2018, 29 (01) : 87 - 95
[19] Accounting for missing data caused by drug cessation in observational comparative effectiveness research: a simulation study
Mongin, Denis
Lauper, Kim
Finckh, Axel
Frisell, Thomas
Courvoisier, Delphine Sophie
ANNALS OF THE RHEUMATIC DISEASES, 2022, 81 (05) : 729 - 736
[20] Bayesian Random Forest with Multiple Imputation by Chain Equations for High-Dimensional Missing Data: A Simulation Study
Olaniran, Oyebayo Ridwan
Alzahrani, Ali Rashash R.
MATHEMATICS, 2025, 13 (06)

← 1 2 3 4 5 →