Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review

被引:103
作者
Si, Yuqi [1 ]
Du, Jingcheng [1 ]
Li, Zhao [1 ]
Jiang, Xiaoqian [1 ]
Miller, Timothy [2 ,3 ]
Wang, Fei [4 ]
Zheng, W. Jim [1 ]
Roberts, Kirk [1 ]
机构
[1] Univ Texas Hlth Sci Ctr Houston, Sch Biomed Informat, 7000 Fannin St 600, Houston, TX 77030 USA
[2] Boston Childrens Hosp, Computat Hlth Informat Program CHIP, Boston, MA USA
[3] Harvard Med Sch, Boston, MA 02115 USA
[4] Cornell Univ, Weill Cornell Med, Dept Populat Hlth Sci, Ithaca, NY USA
基金
美国国家卫生研究院; 美国国家科学基金会;
关键词
Systematic review; Electronic health records; Patient representation; Deep learning; READMISSION;
D O I
10.1016/j.jbi.2020.103671
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Objectives: Patient representation learning refers to learning a dense mathematical representation of a patient that encodes meaningful information from Electronic Health Records (EHRs). This is generally performed using advanced deep learning methods. This study presents a systematic review of this field and provides both qualitative and quantitative analyses from a methodological perspective. Methods: We identified studies developing patient representations from EHRs with deep learning methods from MEDLINE, EMBASE, Scopus, the Association for Computing Machinery (ACM) Digital Library, and the Institute of Electrical and Electronics Engineers (IEEE) Xplore Digital Library. After screening 363 articles, 49 papers were included for a comprehensive data collection. Results: Publications developing patient representations almost doubled each year from 2015 until 2019. We noticed a typical workflow starting with feeding raw data, applying deep learning models, and ending with clinical outcome predictions as evaluations of the learned representations. Specifically, learning representations from structured EHR data was dominant (37 out of 49 studies). Recurrent Neural Networks were widely applied as the deep learning architecture (Long short-term memory: 13 studies, Gated recurrent unit: 11 studies). Learning was mainly performed in a supervised manner (30 studies) optimized with cross-entropy loss. Disease prediction was the most common application and evaluation (31 studies). Benchmark datasets were mostly unavailable (28 studies) due to privacy concerns of EHR data, and code availability was assured in 20 studies. Discussion & Conclusion: The existing predictive models mainly focus on the prediction of single diseases, rather than considering the complex mechanisms of patients from a holistic review. We show the importance and feasibility of learning comprehensive representations of patient EHR data through a systematic review. Advances in patient representation learning techniques will be essential for powering patient-level EHR analyses. Future work will still be devoted to leveraging the richness and potential of available EHR data. Reproducibility and transparency of reported results will hopefully improve. Knowledge distillation and advanced learning techniques will be exploited to assist the capability of learning patient representation further.
引用
收藏
页数:13
相关论文
共 134 条
[1]   Zotero: A bibliographic assistant to researcher [J].
Ahmed, K. K. Mueen ;
Al Dhubaib, Bandar E. .
JOURNAL OF PHARMACOLOGY & PHARMACOTHERAPEUTICS, 2011, 2 (04) :303-304
[2]  
[Anonymous], 2006, P 2006 ACM S APPL CO
[3]  
[Anonymous], 2011, P 28 INT C INT C MAC, DOI DOI 10.5555/3104482.3104587
[4]  
Bahdanau D, 2016, Arxiv, DOI arXiv:1409.0473
[5]   Interpretable Representation Learning for Healthcare via Capturing Disease Progression through Time [J].
Bai, Tian ;
Zhang, Shanshan ;
Egleston, Brian L. ;
Vucetic, Slobodan .
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, :43-51
[6]   EHR phenotyping via jointly embedding medical concepts and words into a unified vector space [J].
Bai, Tian ;
Chanda, Ashis Kumar ;
Egleston, Brian L. ;
Vucetic, Slobodan .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2018, 18
[7]   Benchmarking Deep Learning Architectures for Predicting Readmission to the ICU and Describing Patients-at-Risk [J].
Barbieri, Sebastiano ;
Kemp, James ;
Perez-Concha, Oscar ;
Kotwal, Sradha ;
Gallagher, Martin ;
Ritchie, Angus ;
Jorm, Louisa .
SCIENTIFIC REPORTS, 2020, 10 (01)
[8]   Patient Subtyping via Time-Aware LSTM Networks [J].
Baytas, Inci M. ;
Xiao, Cao ;
Zhang, Xi ;
Wang, Fei ;
Jain, Anil K. ;
Zhou, Jiayu .
KDD'17: PROCEEDINGS OF THE 23RD ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2017, :65-74
[9]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[10]  
Carroll J. D., 1998, Measurement, Judgment and Decision Making, P179