Predicting Survival From Large Echocardiography and Electronic Health Record Datasets Optimization With Machine Learning

被引:99
作者
Samad, Manar D. [1 ]
Ulloa, Alvaro [1 ]
Wehner, Gregory J. [2 ]
Jing, Linyuan [1 ]
Hartzel, Dustin [1 ]
Good, Christopher W. [3 ]
Williams, Brent A. [4 ]
Haggerty, Christopher M. [1 ]
Fornwalt, Brandon K. [1 ,2 ,5 ]
机构
[1] Geisinger, Dept Imaging Sci & Innovat, Danville, PA 17822 USA
[2] Univ Kentucky, Dept Biomed Engn, Lexington, KY USA
[3] Geisinger, Dept Cardiol, Danville, PA 17822 USA
[4] Geisinger, Dept Epidemiol & Hlth Serv Res, Danville, PA 17822 USA
[5] Geisinger, Dept Radiol, Danville, PA 17822 USA
基金
美国国家卫生研究院;
关键词
echocardiography; electronic health records; machine learning; mortality; SEATTLE HEART-FAILURE; MODEL; REGRESSION; MORTALITY;
D O I
10.1016/j.jcmg.2018.04.026
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
OBJECTIVES The goal of this study was to use machine learning to more accurately predict survival after echocardiography. BACKGROUND Predicting patient outcomes (e.g., survival) following echocardiography is primarily based on ejection fraction (EF) and comorbidities. However, there may be significant predictive information within additional echocardiography-derived measurements combined with clinical electronic health record data. METHODS Mortality was studied in 171,510 unselected patients who underwent 331,317 echocardiograms in a large regional health system. The authors investigated the predictive performance of nonlinear machine learning models compared with that of linear logistic regression models using 3 different inputs: 1) clinical variables, including 90 cardiovascular-relevant International Classification of Diseases, Tenth Revision, codes, and age, sex, height, weight, heart rate, blood pressures, low-density lipoprotein, high-density lipoprotein, and smoking; 2) clinical variables plus physician-reported EF; and 3) clinical variables and EF, plus 57 additional echocardiographic measurements. Missing data were imputed with a multivariate imputation by using a chained equations algorithm (MICE). The authors compared models versus each other and baseline clinical scoring systems by using a mean area under the curve (AUC) over 10 cross-validation folds and across 10 survival durations (6 to 60 months). RESULTS Machine teaming models achieved significantly higher prediction accuracy (all AUC >0.82) over common clinical risk scores (AUC = 0.61 to 0.79), with the nonlinear random forest models outperforming Logistic regression (p < 0.01). The random forest model including all echocardiographic measurements yielded the highest prediction accuracy (p < 0.01 across all models and survival durations). Only 10 variables were needed to achieve 96% of the maximum prediction accuracy, with 6 of these variables being derived from echocardiography. Tricuspid regurgitation velocity was more predictive of survival than LVEF. In a subset of studies with complete data for the top 10 variables, multivariate imputation by chained equations yielded slightly reduced predictive accuracies (difference in AUC of 0.003) compared with the original data. CONCLUSIONS Machine learning can fully utilize large combinations of disparate input variables to predict survival after echocardiography with superior accuracy. (C) 2019 by the American College of Cardiology Foundation.
引用
收藏
页码:681 / 689
页数:9
相关论文
共 32 条
  • [11] Echocardiographic Determinants of One-Year All-Cause Mortality in Patients With Chronic Heart Failure Complicated by Significant Functional Tricuspid Regurgitation
    Hu, Kai
    Liu, Dan
    Stoerk, Stefan
    Herrmann, Sebastian
    Oder, Daniel
    Ertl, Georg
    Voelker, Wolfram
    Weidemann, Frank
    Nordbeck, Peter
    [J]. JOURNAL OF CARDIAC FAILURE, 2017, 23 (06) : 434 - 443
  • [12] Improved Cardiovascular Risk Prediction Using Nonparametric Regression and Electronic Health Record Data
    Kennedy, Edward H.
    Wiitala, Wyndy L.
    Hayward, Rodney A.
    Sussman, Jeremy B.
    [J]. MEDICAL CARE, 2013, 51 (03) : 251 - 258
  • [13] Kurgan L., 2005, NEXT GENERATION DATA, P415
  • [14] Imputation of missing data in industrial databases
    Lakshminarayan, K
    Harp, SA
    Samad, T
    [J]. APPLIED INTELLIGENCE, 1999, 11 (03) : 259 - 275
  • [15] POINTS OF SIGNIFICANCE Model selection and overfitting
    Lever, Jake
    Krzywinski, Martin
    Altman, Naomi
    [J]. NATURE METHODS, 2016, 13 (09) : 703 - 704
  • [16] The Seattle heart failure model - Prediction of survival in heart failure
    Levy, WC
    Mozaffarian, D
    Linker, DT
    Sutradhar, SC
    Anker, SD
    Cropp, AB
    Anand, I
    Maggioni, A
    Burton, P
    Sullivan, MD
    Pitt, B
    Poole-Wilson, PA
    Mann, DL
    Packer, M
    [J]. CIRCULATION, 2006, 113 (11) : 1424 - 1433
  • [17] Using machine learning methods for predicting inhospital mortality in patients undergoing open repair of abdominal aortic aneurysm
    Monsalve-Torra, Ana
    Ruiz-Fernandez, Daniel
    Marin-Alonso, Oscar
    Soriano-Paya, Antonio
    Camacho-Mackenzie, Jaime
    Carreno-Jaimes, Marisol
    [J]. JOURNAL OF BIOMEDICAL INFORMATICS, 2016, 62 : 195 - 201
  • [18] Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: a 5-year multicentre prospective registry analysis
    Motwani, Manish
    Dey, Damini
    Berman, Daniel S.
    Germano, Guido
    Achenbach, Stephan
    Al-Mallah, Mouaz H.
    Andreini, Daniele
    Budoff, Matthew J.
    Cademartiri, Filippo
    Callister, Tracy Q.
    Chang, Hyuk-Jae
    Chinnaiyan, Kavitha
    Chow, Benjamin J. W.
    Cury, Ricardo C.
    Delago, Augustin
    Gomez, Millie
    Gransar, Heidi
    Hadamitzky, Martin
    Hausleiter, Joerg
    Hindoyan, Niree
    Feuchtner, Gudrun
    Kaufmann, Philipp A.
    Kim, Yong-Jin
    Leipsic, Jonathon
    Lin, Fay Y.
    Maffei, Erica
    Marques, Hugo
    Pontone, Gianluca
    Raff, Gilbert
    Rubinshtein, Ronen
    Shaw, Leslee J.
    Stehli, Julia
    Villines, Todd C.
    Dunning, Allison
    Min, James K.
    Slomka, Piotr J.
    [J]. EUROPEAN HEART JOURNAL, 2017, 38 (07) : 500 - 507
  • [19] Machine-Learning Algorithms to Automate Morphological and Functional Assessments in 2D Echocardiography
    Narula, Sukrit
    Shameer, Khader
    Omar, Alaa Mabrouk Salem
    Dudley, Joel T.
    Sengupta, Partho P.
    [J]. JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2016, 68 (21) : 2287 - 2295
  • [20] Impact of tricuspid regurgitation on long-term survival
    Nath, J
    Foster, E
    Heidenreich, PA
    [J]. JOURNAL OF THE AMERICAN COLLEGE OF CARDIOLOGY, 2004, 43 (03) : 405 - 409