Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

被引:54
作者
Austin, Peter C. [1 ,2 ,3 ]
Lee, Douglas S. [1 ,2 ,5 ,6 ,7 ]
Steyerberg, Ewout W. [8 ]
Tu, Jack V. [1 ,2 ,4 ,5 ]
机构
[1] Inst Clin Evaluat Sci, Toronto, ON, Canada
[2] Univ Toronto, Inst Hlth Policy Management & Evaluat, Toronto, ON M5S 1A1, Canada
[3] Univ Toronto, Dalla Lana Sch Publ Hlth, Toronto, ON, Canada
[4] Univ Toronto, Div Cardiol, Sunnybrook Schulich Heart Ctr, Toronto, ON, Canada
[5] Univ Toronto, Fac Med, Toronto, ON, Canada
[6] Univ Hlth Network, Peter Munk Cardiac Ctr, Toronto, ON, Canada
[7] Univ Hlth Network, Dept Med, Toronto, ON, Canada
[8] Erasmus MC, Dept Publ Hlth, Rotterdam, Netherlands
基金
加拿大健康研究院;
关键词
Acute myocardial infarction; Bagging; Boosting; Data mining; Heart failure; LOGISTIC-REGRESSION; HEART-FAILURE; VALIDATION; DERIVATION; MODELS;
D O I
10.1002/bimj.201100251
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (19992001 and 20042005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease.
引用
收藏
页码:657 / 673
页数:17
相关论文
共 23 条
  • [11] Friedman J., 2001, The Elements of Statistical Learning: Data Mining, Inference, and Prediction, V1
  • [12] Predictors of hospital mortality in the global registry of acute coronary events
    Granger, CB
    Goldberg, RJ
    Dabbous, O
    Pieper, KS
    Eagle, KA
    Cannon, CP
    Van de Werf, F
    Avezum, A
    Goodman, SG
    Flather, MD
    Fox, KAA
    [J]. ARCHIVES OF INTERNAL MEDICINE, 2003, 163 (19) : 2345 - 2353
  • [13] Harrell FE., 2001, Regression Modeling Strategies: with Applications to Linear Models, Logistic Regression, and Survival Analysis, V608, DOI DOI 10.2147/
  • [14] Early Deaths in Patients With Heart Failure Discharged From the Emergency Department A Population-Based Analysis
    Lee, Douglas S.
    Schull, Michael J.
    Alter, David A.
    Austin, Peter C.
    Laupacis, Andreas
    Chong, Alice
    Tu, Jack V.
    Stukel, Therese A.
    [J]. CIRCULATION-HEART FAILURE, 2010, 3 (02) : 228 - 235
  • [15] Risk-treatment mismatch in the pharmacotherapy of heart failure
    Lee, DS
    Tu, JV
    Juurlink, DN
    Alter, DA
    Ko, DT
    Austin, PC
    Chong, A
    Stukel, TA
    Levy, D
    Laupacis, A
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2005, 294 (10): : 1240 - 1247
  • [16] Predicting mortality among patients hospitalized for heart failure - Derivation and validation of a clinical model
    Lee, DS
    Austin, PC
    Rouleau, JL
    Liu, PP
    Naimark, D
    Tu, JV
    [J]. JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2003, 290 (19): : 2581 - 2587
  • [17] Propensity score estimation with boosted regression for evaluating causal effects in observational studies
    McCaffrey, DF
    Ridgeway, G
    Morral, AR
    [J]. PSYCHOLOGICAL METHODS, 2004, 9 (04) : 403 - 425
  • [18] Rose S, 2011, SPRINGER SER STAT, P3, DOI 10.1007/978-1-4419-9782-1
  • [19] Steyerberg E, 2009, Clinical prediction models: a practical approach to development, validation, and updating, DOI 10.1007/978-0-387-77244-8
  • [20] Assessing the Performance of Prediction Models A Framework for Traditional and Novel Measures
    Steyerberg, Ewout W.
    Vickers, Andrew J.
    Cook, Nancy R.
    Gerds, Thomas
    Gonen, Mithat
    Obuchowski, Nancy
    Pencina, Michael J.
    Kattan, Michael W.
    [J]. EPIDEMIOLOGY, 2010, 21 (01) : 128 - 138