Individual risk prediction: Comparing random forests with Cox proportional-hazards model by a simulation study

被引:14
作者
Baralou, Valia [1 ]
Kalpourtzi, Natasa [1 ]
Touloumi, Giota [1 ]
机构
[1] Natl & Kapodistrian Univ Athens, Med Sch, Dept Hyg Epidemiol & Med Stat, Athens 11527, Greece
关键词
Cox model; machine learning; random survival forest; survival analysis; RANDOM SURVIVAL FORESTS; CARDIOVASCULAR-DISEASE; LIFE-STYLE; REGRESSION; SCORE;
D O I
10.1002/bimj.202100380
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
With big data becoming widely available in healthcare, machine learning algorithms such as random forest (RF) that ignores time-to-event information and random survival forest (RSF) that handles right-censored data are used for individual risk prediction alternatively to the Cox proportional hazards (Cox-PH) model. We aimed to systematically compare RF and RSF with Cox-PH. RSF with three split criteria [log-rank (RSF-LR), log-rank score (RSF-LRS), maximally selected rank statistics (RSF-MSR)]; RF, Cox-PH, and Cox-PH with splines (Cox-S) were evaluated through a simulation study based on real data. One hundred eighty scenarios were investigated assuming different associations between the predictors and the outcome (linear/linear and interactions/nonlinear/nonlinear and interactions), training sample sizes (500/1000/5000), censoring rates (50%/75%/93%), hazard functions (increasing/decreasing/constant), and number of predictors (seven, 15 including noise variables). Methods' performance was evaluated with time-dependent area under curve and integrated Brier score. In all scenarios, RF had the worst performance. In scenarios with a low number of events (<= 70), Cox-PH was at least noninferior to RSF, whereas under linearity assumption it outperformed RSF. Under the presence of interactions, RSF performed better than Cox-PH as the number of events increased whereas Cox-S reached at least similar performance with RSF under nonlinear effects. RSF-LRS performed slightly worse than RSF-LR and RSF-MSR when including noise variables and interaction effects. When applied to real data, models incorporating survival time performed better. Although RSF algorithms are a promising alternative to conventional Cox-PH as data complexity increases, they require a higher number of events for training. In time-to-event analysis, algorithms that consider survival time should be used.
引用
收藏
页数:13
相关论文
共 44 条
[1]  
Baralou V., 2021, 42 ANN C INT SOC CLI
[2]  
Biganzoli E, 1998, STAT MED, V17, P1169, DOI 10.1002/(SICI)1097-0258(19980530)17:10<1169::AID-SIM796>3.3.CO
[3]  
2-4
[4]   The c-index is not proper for the evaluation of -year predicted risks [J].
Blanche, Paul ;
Kattan, Michael W. ;
Gerds, Thomas A. .
BIOSTATISTICS, 2019, 20 (02) :347-357
[5]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[6]   AUSDRISK: an Australian Type 2 Diabetes Risk Assessment Tool based on demographic, lifestyle and simple anthropometric measures [J].
Chen, Lei ;
Magliano, Dianna J. ;
Balkau, Beverley ;
Colagiuri, Stephen ;
Zimmet, Paul Z. ;
Tonkin, Andrew M. ;
Mitchell, Paul ;
Phillips, Patrick J. ;
Shaw, Jonathan E. .
MEDICAL JOURNAL OF AUSTRALIA, 2010, 192 (04) :197-202
[7]   Ten-year prediction of suicide death using Cox regression and machine learning in a nationwide retrospective cohort study in South Korea [J].
Choi, Soo Beom ;
Lee, Wanhyung ;
Yoon, Jin-Ha ;
Won, Jong-Uk ;
Kim, Deok Won .
JOURNAL OF AFFECTIVE DISORDERS, 2018, 231 :8-14
[8]   A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models [J].
Christodoulou, Evangelia ;
Ma, Jie ;
Collins, Gary S. ;
Steyerberg, Ewout W. ;
Verbakel, Jan Y. ;
Van Calster, Ben .
JOURNAL OF CLINICAL EPIDEMIOLOGY, 2019, 110 :12-22
[9]   Estimation of ten-year risk of fatal cardiovascular disease in Europe:: the SCORE project [J].
Conroy, RM ;
Pyörälä, K ;
Fitzgerald, AP ;
Sans, S ;
Menotti, A ;
De Backer, G ;
De Bacquer, D ;
Ducimetière, P ;
Jousilahti, P ;
Keil, U ;
Njolstad, I ;
Oganov, RG ;
Thomsen, T ;
Tunstall-Pedoe, H ;
Tverdal, A ;
Wedel, H ;
Whincup, P ;
Wilhelmsen, L ;
Graham, IM .
EUROPEAN HEART JOURNAL, 2003, 24 (11) :987-1003
[10]  
COX DR, 1972, J R STAT SOC B, V34, P187