Characterizing and Managing Missing Structured Data in Electronic Health Records: Data Analysis

被引:91
|
作者
Beaulieu-Jones, Brett K. [1 ,2 ]
Lavage, Daniel R. [3 ]
Snyder, John W. [3 ]
Moore, Jason H. [2 ]
Pendergrass, Sarah A. [3 ]
Bauer, Christopher R. [3 ]
机构
[1] Univ Penn, Perelman Sch Med, Genom & Comp Biol Grad Grp, Philadelphia, PA 19104 USA
[2] Univ Penn, Inst Biomed Informat, Philadelphia, PA 19104 USA
[3] Geisinger, Biomed & Translat Informat Inst, 100 N Acad Ave, Danville, PA 17822 USA
基金
美国国家卫生研究院;
关键词
imputation; missing data; clinical laboratory test results; electronic health records; MULTIPLE IMPUTATION; SENSITIVITY-ANALYSIS;
D O I
10.2196/medinform.8960
中图分类号
R-058 [];
学科分类号
摘要
Background: Missing data is a challenge for all studies; however, this is especially true for electronic health record (EHR)-based analyses. Failure to appropriately consider missing data can lead to biased results. While there has been extensive theoretical work on imputation, and many sophisticated methods are now available, it remains quite challenging for researchers to implement these methods appropriately. Here, we provide detailed procedures for when and how to conduct imputation of EHR laboratory results. Objective: The objective of this study was to demonstrate how the mechanism of missingness can be assessed, evaluate the performance of a variety of imputation methods, and describe some of the most frequent problems that can be encountered. Methods: We analyzed clinical laboratory measures from 602,366 patients in the EHR of Geisinger Health System in Pennsylvania, USA. Using these data, we constructed a representative set of complete cases and assessed the performance of 12 different imputation methods for missing data that was simulated based on 4 mechanisms of missingness (missing completely at random, missing not at random, missing at random, and real data modelling). Results: Our results showed that several methods, including variations of Multivariate Imputation by Chained Equations (MICE) and softImpute, consistently imputed missing values with low error; however, only a subset of the MICE methods was suitable for multiple imputation. Conclusions: The analyses we describe provide an outline of considerations for dealing with missing EHR data, steps that researchers can perform to characterize missingness within their own data, and an evaluation of methods that can be applied to impute clinical data. While the performance of methods may vary between datasets, the process we describe can be generalized to the majority of structured data types that exist in EHRs, and all of our methods and code are publicly available.
引用
收藏
页数:12
相关论文
共 50 条
  • [21] Analysis of repeated binary data: sensitivity to missing data
    Minini, P
    Chavance, M
    REVUE D EPIDEMIOLOGIE ET DE SANTE PUBLIQUE, 2004, 52 (05): : 455 - 464
  • [22] Quantile regression for nonignorable missing data with its application of analyzing electronic medical records
    Yu, Aiai
    Zhong, Yujie
    Feng, Xingdong
    Wei, Ying
    BIOMETRICS, 2023, 79 (03) : 2036 - 2049
  • [23] Missing Value Imputation Methods for Electronic Health Records
    Psychogyios, Konstantinos
    Ilias, Loukas
    Ntanos, Christos
    Askounis, Dimitris
    IEEE ACCESS, 2023, 11 : 21562 - 21574
  • [24] Analysis of missing data in electronic health records of people with diabetes in primary care in Spain: A population-based cohort study
    Quesada, Jose Antonio
    Orozco-Beltran, Domingo
    INTERNATIONAL JOURNAL OF MEDICAL INFORMATICS, 2025, 194
  • [25] Investigating Bias from Missing Data in an Electronic Health Records-Based Study of Weight Loss After Bariatric Surgery
    Koffman, Lily
    Levis, Alexander W.
    Arterburn, David
    Coleman, Karen J.
    Herrinton, Lisa J.
    Cooper, Julie
    Ewing, John
    Fischer, Heidi
    Fraser, James R.
    Johnson, Eric
    Taylor, Brianna
    Theis, Mary Kay
    Liu, Liyan
    Courcoulas, Anita
    Li, Robert
    Fisher, David P.
    Amsden, Laura
    Haneuse, Sebastien
    OBESITY SURGERY, 2021, 31 (05) : 2125 - 2135
  • [26] Investigating Bias from Missing Data in an Electronic Health Records-Based Study of Weight Loss After Bariatric Surgery
    Lily Koffman
    Alexander W. Levis
    David Arterburn
    Karen J. Coleman
    Lisa J. Herrinton
    Julie Cooper
    John Ewing
    Heidi Fischer
    James R. Fraser
    Eric Johnson
    Brianna Taylor
    Mary Kay Theis
    Liyan Liu
    Anita Courcoulas
    Robert Li
    David P. Fisher
    Laura Amsden
    Sebastien Haneuse
    Obesity Surgery, 2021, 31 : 2125 - 2135
  • [27] Missing Data Analysis Using Statistical and Machine Learning Methods in Facility-Based Maternal Health Records
    Memon S.M.Z.
    Wamala R.
    Kabano I.H.
    SN Computer Science, 3 (5)
  • [28] Big data and electronic health records for glaucoma research
    Bernstein, Isaac A.
    Fernandez, Karen S.
    Stein, Joshua D.
    Pershing, Suzann
    Wang, Sophia Y.
    TAIWAN JOURNAL OF OPHTHALMOLOGY, 2024, 14 (03) : 352 - 359
  • [29] Use of Data from Electronic Health Records for Pharmacoepidemiology
    Michael D. Murray
    Current Epidemiology Reports, 2014, 1 (4) : 186 - 193
  • [30] A Study Into the Impact of Data Breaches of Electronic Health Records
    Pilla, Ravi
    Oseni, Taiwo
    Stranieri, Andrew
    PROCEEDINGS OF 2023 AUSTRALIAN COMPUTER SCIENCE WEEK, ACSW 2023, 2023, : 252 - 254