Regression trees for predicting mortality in patients with cardiovascular disease: What improvement is achieved by using ensemble-based methods?

被引:54
作者
Austin, Peter C. [1 ,2 ,3 ]
Lee, Douglas S. [1 ,2 ,5 ,6 ,7 ]
Steyerberg, Ewout W. [8 ]
Tu, Jack V. [1 ,2 ,4 ,5 ]
机构
[1] Inst Clin Evaluat Sci, Toronto, ON, Canada
[2] Univ Toronto, Inst Hlth Policy Management & Evaluat, Toronto, ON M5S 1A1, Canada
[3] Univ Toronto, Dalla Lana Sch Publ Hlth, Toronto, ON, Canada
[4] Univ Toronto, Div Cardiol, Sunnybrook Schulich Heart Ctr, Toronto, ON, Canada
[5] Univ Toronto, Fac Med, Toronto, ON, Canada
[6] Univ Hlth Network, Peter Munk Cardiac Ctr, Toronto, ON, Canada
[7] Univ Hlth Network, Dept Med, Toronto, ON, Canada
[8] Erasmus MC, Dept Publ Hlth, Rotterdam, Netherlands
基金
加拿大健康研究院;
关键词
Acute myocardial infarction; Bagging; Boosting; Data mining; Heart failure; LOGISTIC-REGRESSION; HEART-FAILURE; VALIDATION; DERIVATION; MODELS;
D O I
10.1002/bimj.201100251
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (19992001 and 20042005). We found that both the in-sample and out-of-sample prediction of ensemble methods offered substantial improvement in predicting cardiovascular mortality compared to conventional regression trees. However, conventional logistic regression models that incorporated restricted cubic smoothing splines had even better performance. We conclude that ensemble methods from the data mining and machine learning literature increase the predictive performance of regression trees, but may not lead to clear advantages over conventional logistic regression models for predicting short-term mortality in population-based samples of subjects with cardiovascular disease.
引用
收藏
页码:657 / 673
页数:17
相关论文
共 23 条
  • [1] [Anonymous], 1984, OLSHEN STONE CLASSIF, DOI 10.2307/2530946
  • [2] A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality
    Austin, Peter C.
    [J]. STATISTICS IN MEDICINE, 2007, 26 (15) : 2937 - 2957
  • [3] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [4] Boosting algorithms: Regularization, prediction and model fitting
    Buehlmann, Peter
    Hothorn, Torsten
    [J]. STATISTICAL SCIENCE, 2007, 22 (04) : 477 - 505
  • [5] Clark L. A., 1993, STAT MODELS S
  • [6] TIMI, GRACE and alternative risk scores in Acute Coronary Syndromes: A meta-analysis of 40 derivation studies on 216,552 patients and of 42 validation studies on 31,625 patients
    D'Ascenzo, Fabrizio
    Biondi-Zoccai, Giuseppe
    Moretti, Claudio
    Bollati, Mario
    Omede, Pierluigi
    Sciuto, Filippo
    Presutti, Davide G.
    Modena, Maria Grazia
    Gasparini, Mauro
    Reed, Matthew J.
    Sheiban, Imad
    Gaita, Fiorenzo
    [J]. CONTEMPORARY CLINICAL TRIALS, 2012, 33 (03) : 507 - 514
  • [7] Efron B., 1993, INTRO BOOTSTRAP, DOI 10.1007/978-1-4899-4541-9
  • [8] Ennis M, 1998, STAT MED, V17, P2501
  • [9] Freund Y., 1996, INT C MACH LEARN ICM, V6, P148, DOI DOI 10.5555/3091696.3091715
  • [10] Additive logistic regression: A statistical view of boosting - Rejoinder
    Friedman, J
    Hastie, T
    Tibshirani, R
    [J]. ANNALS OF STATISTICS, 2000, 28 (02) : 400 - 407