Missing Value Imputation Methods for Electronic Health Records

被引:15
|
作者
Psychogyios, Konstantinos [1 ]
Ilias, Loukas [1 ]
Ntanos, Christos [1 ]
Askounis, Dimitris [1 ]
机构
[1] Natl Tech Univ Athens, Sch Elect & Comp Engn, Decis Support Syst Lab, Athens 15780, Greece
关键词
Training; Task analysis; Deep learning; Heart; Electronic medical records; Noise reduction; Generative adversarial networks; Missing value imputation; deep learning; generative adversarial networks; autoencoders; missing data; EHR; MULTIPLE IMPUTATION;
D O I
10.1109/ACCESS.2023.3251919
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Electronic health records (EHR) are patient-level information, e.g., laboratory tests and questionnaires, stored in electronic format. Compared to physical records, the EHR alternative allows patients to access their data easily and helps staff with management procedural tasks such as information sharing across different organizations. Moreover, this type of data is commonly used by researchers for predictive and classification purposes, employing statistical and machine learning methods. However, missingness is a phenomenon that is observed very frequently for such measurements. Even though this missingness is often significant, it is usually treated poorly with either case deletion or simple methods, resulting in suboptimal and/or inaccurate predictive results. This happens because the simple methods, e.g., k-nearest neighbors (kNN) and mean/mode imputation, fail in most cases to incorporate the complex relationships that define these medical datasets. To address these limitations, in this paper we test and improve state-of-the-art missing data imputation models and practices. We propose a new missing value imputation method based on denoising autoencoders (DAE) with kNN for the pre-imputation task. We optimize the training methodology by re-applying kNN to the missing data every N epochs using a different value for the variable k each time to yield more accurate results. We also revise a state-of-the-art missing data imputation approach based on a generative adversarial network (GAN). Using this as a baseline, we introduce improvements regarding both the architecture and the training procedure. These models are compared with the ones usually employed within clinical research studies for both the task of imputation and post-imputation prediction. Results show that our proposed deep learning approaches outperform the standard baselines, yielding better imputation and predictive results.
引用
收藏
页码:21562 / 21574
页数:13
相关论文
共 50 条
  • [1] Multiple Imputation of Missing Data in Longitudinal Electronic Health Records
    Petersen, Irene
    Welch, Catherine
    Bartlett, Jonathan
    Morris, Richard
    Walters, Kate
    Nazareth, Irwin
    Marston, Louise
    White, Ian
    Carpenter, James
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2013, 22 : 302 - 302
  • [2] Missing Value Imputation in Medical Records for Remote Health Care
    Das, Sayan
    Sil, Jaya
    DATA SCIENCE AND BIG DATA ANALYTICS, 2019, 16 : 321 - 331
  • [3] Imputation of Missing Data in Electronic Health Records Based on Patients' Similarities
    Jazayeri, Ali
    Liang, Ou Stella
    Yang, Christopher C.
    JOURNAL OF HEALTHCARE INFORMATICS RESEARCH, 2020, 4 (03) : 295 - 307
  • [4] Imputation of Missing Data in Electronic Health Records Based on Patients’ Similarities
    Ali Jazayeri
    Ou Stella Liang
    Christopher C. Yang
    Journal of Healthcare Informatics Research, 2020, 4 : 295 - 307
  • [5] FedIMPUTE: Privacy-preserving missing value imputation for multi-site heterogeneous electronic health records
    Li, Siqi
    Yan, Mengying
    Yuan, Ruizhi
    Liu, Molei
    Liu, Nan
    Hong, Chuan
    JOURNAL OF BIOMEDICAL INFORMATICS, 2025, 165
  • [6] BAYESIAN PROFILING MULTIPLE IMPUTATION FOR MISSING HEMOGLOBIN VALUES IN ELECTRONIC HEALTH RECORDS
    Si, Yajuan
    Palta, Mari
    Smith, Maureen
    ANNALS OF APPLIED STATISTICS, 2020, 14 (04): : 1903 - 1924
  • [7] Interpatient Similarity-based Imputation of Missing Data in Electronic Health Records
    Jazayeri, Ali
    Liang, Ou Stella
    Yang, Christopher C.
    2019 IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (ICHI), 2019, : 547 - 549
  • [8] Attention-based Imputation of Missing Values in Electronic Health Records Tabular Data
    Kowsar, Ibna
    Rabbani, Shourav B.
    Samad, Manar D.
    2024 IEEE 12TH INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS, ICHI 2024, 2024, : 177 - 182
  • [9] Integration of genetic and clinical information to improve imputation of data missing from electronic health records
    Li, Ruowang
    Chen, Yong
    Moore, Jason H.
    JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2019, 26 (10) : 1056 - 1063
  • [10] A Pragmatic Ensemble Strategy for Missing Values Imputation in Health Records
    Batra, Shivani
    Khurana, Rohan
    Khan, Mohammad Zubair
    Boulila, Wadii
    Koubaa, Anis
    Srivastava, Prakash
    ENTROPY, 2022, 24 (04)