Scalable and accurate deep learning with electronic health records

被引:1424
作者
Rajkomar, Alvin [1 ,2 ]
Oren, Eyal [1 ]
Chen, Kai [1 ]
Dai, Andrew M. [1 ]
Hajaj, Nissan [1 ]
Hardt, Michaela [1 ]
Liu, Peter J. [1 ]
Liu, Xiaobing [1 ]
Marcus, Jake [1 ]
Sun, Mimi [1 ]
Sundberg, Patrik [1 ]
Yee, Hector [1 ]
Zhang, Kun [1 ]
Zhang, Yi [1 ]
Flores, Gerardo [1 ]
Duggan, Gavin E. [1 ]
Irvine, Jamie [1 ]
Quoc Le [1 ]
Litsch, Kurt [1 ]
Mossin, Alexander [1 ]
Tansuwan, Justin [1 ]
Wang, De [1 ]
Wexler, James [1 ]
Wilson, Jimbo [1 ]
Ludwig, Dana [2 ]
Volchenboum, Samuel L. [3 ]
Chou, Katherine [1 ]
Pearson, Michael [1 ]
Madabushi, Srinivasan [1 ]
Shah, Nigam H. [4 ]
Butte, Atul J. [2 ]
Howell, Michael D. [1 ]
Cui, Claire [1 ]
Corrado, Greg S. [1 ]
Dean, Jeffrey [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
[2] Univ Calif San Francisco, San Francisco, CA 94143 USA
[3] Univ Chicago Med, Chicago, IL USA
[4] Stanford Univ, Stanford, CA 94305 USA
来源
NPJ DIGITAL MEDICINE | 2018年 / 1卷
关键词
RISK PREDICTION MODELS; EARLY WARNING SCORE; BIG DATA; HOSPITAL READMISSION; MEDICAL-RECORDS; VALIDATION; CARE; INPATIENT; ANALYTICS; PATIENT;
D O I
10.1038/s41746-018-0029-1
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Predictive modeling with electronic health record (EHR) data is anticipated to drive personalized medicine and improve healthcare quality. Constructing predictive statistical models typically requires extraction of curated predictor variables from normalized EHR data, a labor-intensive process that discards the vast majority of information in each patient's record. We propose a representation of patients' entire raw EHR records based on the Fast Healthcare Interoperability Resources (FHIR) format. We demonstrate that deep learning methods using this representation are capable of accurately predicting multiple medical events from multiple centers without site-specific data harmonization. We validated our approach using de-identified EHR data from two US academic medical centers with 216,221 adult patients hospitalized for at least 24 h. In the sequential format we propose, this volume of EHR data unrolled into a total of 46,864,534,945 data points, including clinical notes. Deep learning models achieved high accuracy for tasks such as predicting: in-hospital mortality (area under the receiver operator curve [AUROC] across sites 0.93-0.94), 30-day unplanned readmission (AUROC 0.75-0.76), prolonged length of stay (AUROC 0.85-0.86), and all of a patient's final discharge diagnoses (frequency-weighted AUROC 0.90). These models outperformed traditional, clinically-used predictive models in all cases. We believe that this approach can be used to create accurate and scalable predictions for a variety of clinical scenarios. In a case study of a particular prediction, we demonstrate that neural networks can be used to identify relevant information from the patient's chart.
引用
收藏
页数:10
相关论文
共 88 条
[31]   Posthospital care transitions: Patterns, complications, and risk identification [J].
Coleman, EA ;
Min, SJ ;
Chomiak, A ;
Kramer, AM .
HEALTH SERVICES RESEARCH, 2004, 39 (05) :1449-1465
[32]  
Dai AM, 2015, ADV NEUR IN, V28
[33]   Potentially Avoidable 30-Day Hospital Readmissions in Medical Patients Derivation and Validation of a Prediction Model [J].
Donze, Jacques ;
Aujesky, Drahomir ;
Williams, Deborah ;
Schnipper, Jeffrey L. .
JAMA INTERNAL MEDICINE, 2013, 173 (08) :632-638
[34]   Insights into the Problem of Alarm Fatigue with Physiologic Monitor Devices: A Comprehensive Observational Study of Consecutive Intensive Care Unit Patients [J].
Drew, Barbara J. ;
Harris, Patricia ;
Zegre-Hemsey, Jessica K. ;
Mammone, Tina ;
Schindler, Daniel ;
Salas-Boni, Rebeca ;
Bai, Yong ;
Tinoco, Adelita ;
Ding, Quan ;
Hu, Xiao .
PLOS ONE, 2014, 9 (10)
[35]   Risk-adjusting hospital inpatient mortality using automated inpatient, outpatient, and laboratory databases [J].
Escobar, Gabriel J. ;
Greene, John D. ;
Scheirer, Peter ;
Gardner, Marla N. ;
Draper, David ;
Kipnis, Patricia .
MEDICAL CARE, 2008, 46 (03) :232-239
[36]   Nonelective Rehospitalizations and Postdischarge Mortality Predictive Models Suitable for Use in Real Time [J].
Escobar, Gabriel J. ;
Ragins, Arona ;
Scheirer, Peter ;
Liu, Vincent ;
Robles, Jay ;
Kipnis, Patricia .
MEDICAL CARE, 2015, 53 (11) :916-923
[37]   Measuring the Modified Early Warning Score and the Rothman Index: Advantages of Utilizing the Electronic Medical Record in an Early Warning System [J].
Finlay, G. Duncan ;
Rothman, Michael J. ;
Smith, Robert A. .
JOURNAL OF HOSPITAL MEDICINE, 2014, 9 (02) :116-119
[38]  
Frome A., 2013, NeurIPS, P2121, DOI DOI 10.5555/2999792.2999849
[39]   A comparison of models for predicting early hospital readmissions [J].
Futoma, Joseph ;
Morris, Jonathan ;
Lucas, Joseph .
JOURNAL OF BIOMEDICAL INFORMATICS, 2015, 56 :229-238
[40]  
Gil Press, 2016, Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says