Logistic regression and machine learning predicted patient mortality from large sets of diagnosis codes comparably

被引:20
作者
Cowling, Thomas E. [1 ,2 ]
Cromwell, David A. [1 ,2 ]
Bellot, Alexis [3 ,4 ]
Sharples, Linda D. [5 ]
van der Meulen, Jan [1 ,2 ]
机构
[1] London Sch Hyg & Trop Med, Dept Hlth Serv Res & Policy, Keppel St, London WC1E 7HT, England
[2] Royal Coll Surgeons England, Clin Effectiveness Unit, Lincolns Inn Fields, London WC2A 3PE, England
[3] Univ Cambridge, Dept Appl Math & Theoret Phys, Wilberforce Rd, Cambridge CB3 0WA, England
[4] Alan Turing Inst, 96 Euston Rd, London NW1 2DB, England
[5] London Sch Hyg & Trop Med, Dept Med Stat, Keppel St, London WC1E 7HT, England
基金
英国医学研究理事会;
关键词
Machine learning; Regression analysis; Big data; Electronic health records; International Classification of Diseases; Comorbidity; Prognosis; COMORBIDITY INDEXES; MODELS; CHARLSON; CARE; PERFORMANCE; VALIDATION; ALGORITHMS; ICD-9-CM;
D O I
10.1016/j.jclinepi.2020.12.018
中图分类号
R19 [保健组织与事业(卫生事业管理)];
学科分类号
摘要
Objective: The objective of the study was to compare the performance of logistic regression and boosted trees for predicting patient mortality from large sets of diagnosis codes in electronic healthcare records. Study Design and Setting: We analyzed national hospital records and official death records for patients with myocardial infarction (n = 200,119), hip fracture (n = 169,646), or colorectal cancer surgery (n = 56,515) in England in 2015-2017. One-year mortality was predicted from patient age, sex, and socioeconomic status, and 202 to 257 International Classification of Diseases 10th Revision codes recorded in the preceding year or not (binary predictors). Performance measures included the c-statistic, scaled Brier score, and several measures of calibration. Results: One-year mortality was 17.2% (34,520) after myocardial infarction, 27.2% (46,115) after hip fracture, and 9.3% (5,273) after colorectal surgery. Optimism-adjusted c-statistics for the logistic regression models were 0.884 (95% confidence interval [CI]: 0.882, 0.886), 0.798 (0.796, 0.800), and 0.811 (0.805, 0.817). The equivalent c-statistics for the boosted tree models were 0.891 (95% CI: 0.889, 0.892), 0.804 (0.802, 0.806), and 0.803 (0.797, 0.809). Model performance was also similar when measured using scaled Brier scores. All models were well calibrated overall. Conclusion: In large datasets of electronic healthcare records, logistic regression and boosted tree models of numerous diagnosis codes predicted patient mortality comparably. (c) 2020 Elsevier Inc. All rights reserved.
引用
收藏
页码:43 / 52
页数:10
相关论文
共 74 条
[1]  
[Anonymous], 2016, International Statistical Classification of Diseases and Related Health Problems 10th Revision (ICD-10)
[2]  
[Anonymous], Clinical prediction models: A practical approach to development, validation, and updating
[3]   The Integrated Calibration Index (ICI) and related metrics for quantifying the calibration of logistic regression models [J].
Austin, Peter C. ;
Steyerberg, Ewout W. .
STATISTICS IN MEDICINE, 2019, 38 (21) :4051-4065
[4]   Interpreting the concordance statistic of a logistic regression model: relation to the variance and odds ratio of a continuous explanatory variable [J].
Austin, Peter C. ;
Steyerberg, Ewout W. .
BMC MEDICAL RESEARCH METHODOLOGY, 2012, 12
[5]   Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods? [J].
Austin, Peter C. ;
Lee, Douglas S. ;
Steyerberg, Ewout W. ;
Tu, Jack V. .
BIOMETRICAL JOURNAL, 2012, 54 (05) :657-673
[6]   Improving palliative care with deep learning [J].
Avati, Anand ;
Jung, Kenneth ;
Harman, Stephanie ;
Downing, Lance ;
Ng, Andrew ;
Shah, Nigam H. .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2018, 18
[7]   Big Data and Machine Learning in Health Care [J].
Beam, Andrew L. ;
Kohane, Isaac S. .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2018, 319 (13) :1317-1318
[8]   Reflection on modern methods: when worlds collide-prediction, machine learning and causal inference [J].
Blakely, Tony ;
Lynch, John ;
Simons, Koen ;
Bentley, Rebecca ;
Rose, Sherri .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2020, 49 (06) :2058-2064
[9]  
Bottle A, 2014, NIHR J LIB SOUTHAMPT
[10]  
Brier GW, 1950, MON WEATHER REV, V78, P1, DOI [DOI 10.1175/1520-0493(1950)0782.0.CO