EMR-LIP: A lightweight framework for standardizing the preprocessing of longitudinal irregular data in electronic medical records

被引:0
作者
Luo, Jiawei [1 ,2 ,3 ]
Huang, Shixin [4 ,5 ]
Lan, Lan [6 ]
Yang, Shu [7 ]
Cao, Tingqian [8 ]
Yin, Jin [1 ,2 ,3 ]
Qiu, Jiajun [1 ,2 ,3 ]
Yang, Xiaoyan [1 ,2 ,3 ]
Guo, Yingqiang [1 ]
Zhou, Xiaobo [9 ]
机构
[1] Sichuan Univ, West China Hosp, West China Sch Med, Dept Cardiovasc Surg, Chengdu 610041, Sichuan, Peoples R China
[2] Sichuan Univ, West China Hosp, West China Biomed Big Data Ctr, West China Sch Med, Chengdu 610041, Sichuan, Peoples R China
[3] Sichuan Univ, Medx Ctr Informat, Chengdu 610041, Peoples R China
[4] Peoples Hosp Yubei Dist Chongqing, Dept Sci Res, Chongqing 401120, Peoples R China
[5] Chongqing Univ Posts & Telecommun, Sch Commun & Informat Engn, Chongqing 400065, Peoples R China
[6] Capital Med Univ, Beijing Tiantan Hosp, IT Ctr, Beijing 100070, Peoples R China
[7] Chengdu Univ Tradit Chinese Med, Coll Med Informat Engn, Chengdu 610075, Peoples R China
[8] Sichuan Univ, West China Hosp, Integrated Care Management Ctr, Chengdu 610041, Peoples R China
[9] Univ Texas, Ctr Computat Syst Med, McWilliams Sch Biomed Informat, Hlth Sci Ctr Houston, Houston, TX 77030 USA
基金
中国国家自然科学基金;
关键词
Electronic medical records; Longitudinal data; Irregular data; Preprocessing pipeline; Deep learning; PREDICTION; SEPSIS; MODEL;
D O I
10.1016/j.cmpb.2024.108521
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objective: Longitudinal data from Electronic Medical Records (EMRs) are increasingly utilized to construct predictive models for various clinical tasks, offering enhanced insights into patient health. However, significant discrepancies exist in preprocessing the irregular and intricate EMR data across studies due to the absence of universally accepted tools and standardization methods. This study introduces the Electronic Medical Record Longitudinal Irregular Data Preprocessing (EMR-LIP) framework, a lightweight approach for optimizing the preprocessing of longitudinal, irregular EMR data, aiming to enhance research efficiency, consistency, reproducibility, and comparability. Materials and Methods: EMR-LIP modularizes the preprocessing of longitudinal irregular EMR data, offering tools with a low level of encapsulation. Compared to other pipelines, EMR-LIP categorizes variables in a more granular manner, designing specific preprocessing techniques for each type. To demonstrate its versatility, EMR-LIP was applied in an empirical study to two public EMR databases, MIMIC-IV and eICU-CRD. Data processed with EMRLIP was then used to test several renowned deep learning models on a range of commonly used benchmark tasks. Results: In both the MIMIC-IV and eICU-CRD databases, models based on EMR-LIP showed superior baseline performance compared to previous studies. Interestingly, using data preprocessed by EMR-LIP, traditional models such as LSTM and GRU outperformed more complex models, achieving an AUROC of up to 0.94 for inhospital death prediction. Additionally, models based on EMR-LIP showed stable performance across various resampling intervals and exhibited better fairness in performance across different ethnic groups. Conclusion: EMR-LIP streamlines the preprocessing of irregular longitudinal EMR data, offering an end-to-end solution for model-ready data creation, and has been open-sourced for collaborative refinement by the research community.
引用
收藏
页数:21
相关论文
共 51 条
  • [1] Time Matters: Time-Aware LSTMs for Predictive Business Process Monitoring
    An Nguyen
    Chatterjee, Srijeet
    SvenWeinzierl
    Schwinn, Leo
    Matzner, Martin
    Eskofier, Bjoern
    [J]. PROCESS MINING WORKSHOPS, ICPM 2020 INTERNATIONAL WORKSHOPS, 2021, 406 : 112 - 123
  • [2] Real-time prediction of inpatient length of stay for discharge prioritization
    Barnes, Sean
    Hamrock, Eric
    Toerper, Matthew
    Siddiqui, Sauleh
    Levin, Scott
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2016, 23 (E1) : E2 - E10
  • [3] Machine learning for early detection of sepsis: an internal and temporal validation study
    Bedoya, Armando D.
    Futoma, Joseph
    Clement, Meredith E.
    Corey, Kristin
    Brajer, Nathan
    Lin, Anthony
    Simons, Morgan G.
    Gao, Michael
    Nichols, Marshall
    Balu, Suresh
    Heller, Katherine
    Sendak, Mark
    O'Brien, Cara
    [J]. JAMIA OPEN, 2020, 3 (02) : 252 - 260
  • [4] Beebe-Wang N, 2023, Arxiv, DOI [arXiv:2308.13703, DOI 10.48550/ARXIV.2308.13703]
  • [5] Recurrent Neural Networks for Multivariate Time Series with Missing Values
    Che, Zhengping
    Purushotham, Sanjay
    Cho, Kyunghyun
    Sontag, David
    Liu, Yan
    [J]. SCIENTIFIC REPORTS, 2018, 8
  • [6] Chung JY, 2014, Arxiv, DOI [arXiv:1412.3555, DOI 10.48550/ARXIV.1412.3555]
  • [7] Using Electronic Health Record Data to Develop and Validate a Prediction Model for Adverse Outcomes in the Wards
    Churpek, Matthew M.
    Yuen, Trevor C.
    Park, Seo Young
    Gibbons, Robert
    Edelson, Dana P.
    [J]. CRITICAL CARE MEDICINE, 2014, 42 (04) : 841 - 848
  • [8] Big Data and data science: A critical review of issues for educational research
    Daniel, Ben Kei
    [J]. BRITISH JOURNAL OF EDUCATIONAL TECHNOLOGY, 2019, 50 (01) : 101 - 113
  • [9] Development and Evaluation of an Automated Machine Learning Algorithm for In-Hospital Mortality Risk Adjustment Among Critical Care Patients
    Delahanty, Ryan J.
    Kaufman, David
    Jones, Spencer S.
    [J]. CRITICAL CARE MEDICINE, 2018, 46 (06) : E481 - E488
  • [10] Introducing HL7 FHIR Genomics Operations: a developer-friendly approach to genomics-EHR integration
    Dolin, Robert H.
    Heale, Bret S. E.
    Alterovitz, Gil
    Gupta, Rohan
    Aronson, Justin
    Boxwala, Aziz
    Gothi, Shaileshbhai R.
    Haines, David
    Hermann, Arthur
    Hongsermeier, Tonya
    Husami, Ammar
    Jones, James
    Naeymi-Rad, Frank
    Rapchak, Barbara
    Ravishankar, Chandan
    Shalaby, James
    Terry, May
    Xie, Ning
    Zhang, Powell
    Chamala, Srikar
    [J]. JOURNAL OF THE AMERICAN MEDICAL INFORMATICS ASSOCIATION, 2023, 30 (03) : 485 - 493