Missing Data Approaches in eHealth Research: Simulation Study and a Tutorial for Nonmathematically Inclined Researchers

被引:136
作者
Blankers, Matthijs [1 ,2 ]
Koeter, Maarten W. J. [1 ]
Schippers, Gerard M. [1 ,2 ]
机构
[1] Univ Amsterdam, Acad Med Ctr, AIAR, Dept Psychiat, NL-1100 DD Amsterdam, Netherlands
[2] Arkin Acad, Amsterdam, Netherlands
关键词
Missing data; multiple imputation; Internet; methodology; MULTIPLE IMPUTATION; SUBSTANCE USE; ATTRITION; SAMPLE; PRIMER;
D O I
10.2196/jmir.1448
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Background: Missing data is a common nuisance in eHealth research: it is hard to prevent and may invalidate research findings. Objective: In this paper several statistical approaches to data "missingness" are discussed and tested in a simulation study. Basic approaches (complete case analysis, mean imputation, and last observation carried forward) and advanced methods (expectation maximization, regression imputation, and multiple imputation) are included in this analysis, and strengths and weaknesses are discussed. Methods: The dataset used for the simulation was obtained from a prospective cohort study following participants in an online self-help program for problem drinkers. It contained 124 nonnormally distributed endpoints, that is, daily alcohol consumption counts of the study respondents. Missingness at random (MAR) was induced in a selected variable for 50% of the cases. Validity, reliability, and coverage of the estimates obtained using the different imputation methods were calculated by performing a bootstrapping simulation study. Results: In the performed simulation study, the use of multiple imputation techniques led to accurate results. Differences were found between the 4 tested multiple imputation programs: NORM, MICE, Amelia II, and SPSS MI. Among the tested approaches, Amelia II outperformed the others, led to the smallest deviation from the reference value (Cohen's d = 0.06), and had the largest coverage percentage of the reference confidence interval (96%). Conclusions: The use of multiple imputation improves the validity of the results when analyzing datasets with missing observations. Some of the often-used approaches (LOCF, complete cases analysis) did not perform well, and, hence, we recommend not using these. Accumulating support for the analysis of multiple imputed datasets is seen in more recent versions of some of the widely used statistical software programs making the use of multiple imputation more readily available to less mathematically inclined researchers.
引用
收藏
页码:e54p.1 / e54p.11
页数:11
相关论文
共 50 条
  • [21] Clustering with missing and left-censored data: A simulation study comparing multiple-imputation-based procedures
    Faucheux, Lilith
    Resche-Rigon, Matthieu
    Curis, Emmanuel
    Soumelis, Vassili
    Chevret, Sylvie
    BIOMETRICAL JOURNAL, 2021, 63 (02) : 372 - 393
  • [22] Missing Data in Marginal Structural Models A Plasmode Simulation Study Comparing Multiple Imputation and Inverse Probability Weighting
    Liu, Shao-Hsien
    Chrysanthopoulou, Stavroula A.
    Chang, Qiuzhi
    Hunnicutt, Jacob N.
    Lapane, Kate L.
    MEDICAL CARE, 2019, 57 (03) : 237 - 243
  • [23] Handling of outcome missing data dependent on measured or unmeasured background factors in micro-randomized trial: Simulation and application study
    Kondo, Masahiro
    Oba, Koji
    DIGITAL HEALTH, 2024, 10
  • [24] A simulation study on missing data imputation for dichotomous variables using statistical and machine learning methods
    Ge, Yingfeng
    Li, Zhiwei
    Zhang, Jinxin
    SCIENTIFIC REPORTS, 2023, 13 (01)
  • [25] Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study
    Marshall, Andrea
    Altman, Douglas G.
    Royston, Patrick
    Holder, Roger L.
    BMC MEDICAL RESEARCH METHODOLOGY, 2010, 10
  • [26] Evaluation of bias and precision in methods of analysis for pragmatic trials with missing outcome data: a simulation study
    Royes Joseph
    Julius Sim
    Reuben Ogollah
    Martyn Lewis
    Trials, 14 (Suppl 1)
  • [27] The impact of missing data on analyses of a time-dependent exposure in a longitudinal cohort: A simulation study
    Karahalios A.
    Baglietto L.
    Lee K.J.
    English D.R.
    Carlin J.B.
    Simpson J.A.
    Emerging Themes in Epidemiology, 10 (1):
  • [28] Comparison of techniques for handling missing covariate data within prognostic modelling studies: a simulation study
    Andrea Marshall
    Douglas G Altman
    Patrick Royston
    Roger L Holder
    BMC Medical Research Methodology, 10
  • [29] Handling missing data in a composite outcome with partially observed components: simulation study based on clustered paediatric routine data
    Gachau, Susan
    Njagi, Edmund Njeru
    Owuor, Nelson
    Mwaniki, Paul
    Quartagno, Matteo
    Sarguta, Rachel
    English, Mike
    Ayieko, Philip
    JOURNAL OF APPLIED STATISTICS, 2022, 49 (09) : 2389 - 2402
  • [30] A study of hybrid neural network approaches and the effects of missing data on traffic forecasting
    Chen, HB
    Grant-Muller, S
    Mussone, L
    Montgomery, F
    NEURAL COMPUTING & APPLICATIONS, 2001, 10 (03) : 277 - 286