Super Learner for Survival Data Prediction

被引:17
作者
Golmakani, Marzieh K. [1 ]
Polley, Eric C. [2 ]
机构
[1] Pfizer Inc, San Diego, CA 92121 USA
[2] Mayo Clin Minnesota, Hlth Sci Res, Rochester, MN USA
关键词
super learner; cross-validation; concordance index; Regularized Cox regression; CoxBoost; gradient boosted machines; ADJUVANT THERAPY; VARIABLE SELECTION; COX REGRESSION; LEVAMISOLE; LIKELIHOOD; MODELS; FLUOROURACIL; CARCINOMA;
D O I
10.1515/ijb-2019-0065
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Survival analysis is a widely used method to establish a connection between a time to event outcome and a set of potential covariates. Accurately predicting the time of an event of interest is of primary importance in survival analysis. Many different algorithms have been proposed for survival prediction. However, for a given prediction problem it is rarely, if ever, possible to know in advance which algorithm will perform the best. In this paper we propose two algorithms for constructing super learners in survival data prediction where the individual algorithms are based on proportional hazards. A super learner is a flexible approach to statistical learning that finds the best weighted ensemble of the individual algorithms. Finding the optimal combination of the individual algorithms through minimizing cross-validated risk controls for over-fitting of the final ensemble learner. Candidate algorithms may range from a basic Cox model to tree-based machine learning algorithms, assuming all candidate algorithms are based on the proportional hazards framework. The ensemble weights are estimated by minimizing the cross-validated negative log partial likelihood. We compare the performance of the proposed super learners with existing models through extensive simulation studies. In all simulation scenarios, the proposed super learners are either the best fit or near the best fit. The performances of the newly proposed algorithms are also demonstrated with clinical data examples.
引用
收藏
页数:13
相关论文
共 38 条
[1]   NONPARAMETRIC INFERENCE FOR A FAMILY OF COUNTING PROCESSES [J].
AALEN, O .
ANNALS OF STATISTICS, 1978, 6 (04) :701-726
[2]  
[Anonymous], 2011, The Statistical Analysis of Failure Time Data
[3]  
Bembom O, 2007, STAT APPL GENET MOL, V6
[4]  
Blair A L, 1980, Ulster Med J, V49, P139
[5]  
Breiman L, 1996, MACH LEARN, V24, P49
[6]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[7]  
Breslow N, 1972, J R STAT SOC B, V34, P216, DOI [DOI 10.1111/J.2517-6161.1972.TB00900.X, 10.1111/j.2517-6161.1972.tb00900.x]
[8]   Boosting with the L2 loss:: Regression and classification [J].
Bühlmann, P ;
Yu, B .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2003, 98 (462) :324-339
[9]  
COX DR, 1972, J R STAT SOC B, V34, P187
[10]   Boosting in Cox regression: a comparison between the likelihood-based and the model-based approaches with focus on the R-packages CoxBoost and mboost [J].
De Bin, Riccardo .
COMPUTATIONAL STATISTICS, 2016, 31 (02) :513-531