Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers

被引:229
作者
Deist, Timo M. [1 ,2 ]
Dankers, Frank J. W. M. [2 ,3 ]
Valdes, Gilmer [4 ]
Wijsman, Robin [3 ]
Hsu, I-Chow [4 ]
Oberije, Cary [2 ]
Lustberg, Tim [5 ]
van Soest, Johan [5 ]
Hoebers, Frank [5 ]
Jochems, Arthur [1 ,2 ]
El Naqa, Issam [6 ]
Wee, Leonard
Morin, Olivier [4 ]
Raleigh, David R. [4 ]
Bots, Wouter [3 ,8 ]
Kaanders, Johannes H. [3 ]
Belderbos, Jose [7 ]
Kwint, Margriet [7 ]
Solberg, Timothy [4 ]
Monshouwer, Rene [3 ]
Bussink, Johan [3 ]
Dekker, Andre [5 ]
Lambin, Philippe [1 ]
机构
[1] Maastricht Univ, Med Ctr, Sch Oncol & Dev Biol, D Lab Decis Support Precis Med,GROW, Univ Singel 40, NL-6229 ER Maastricht, Netherlands
[2] Maastricht Univ, Med Ctr, Sch Oncol & Dev Biol, Dept Radiat Oncol,GROW, Maastricht, Netherlands
[3] Radboud Univ Nijmegen, Med Ctr, Dept Radiat Oncol, Nijmegen, Netherlands
[4] Univ Calif San Francisco, Dept Radiat Oncol, San Francisco, CA USA
[5] Maastricht Univ, Med Ctr, Sch Oncol & Dev Biol, Dept Radiat Oncol MAASTRO,GROW, Maastricht, Netherlands
[6] Univ Michigan, Dept Radiat Oncol, Ann Arbor, MI 48109 USA
[7] Antoni van Leeuwenhoek Hosp, Netherlands Canc Inst, Dept Radiat Oncol, Amsterdam, Netherlands
[8] Inst Hyperbar Oxygen IvHG, Arnhem, Netherlands
基金
欧盟地平线“2020”;
关键词
classification; machine learning; outcome prediction; predictive modeling; radiotherapy; CELL LUNG-CANCER; ACUTE ESOPHAGEAL TOXICITY; SURVIVAL PREDICTION; MODEL; RADIOTHERAPY;
D O I
10.1002/mp.12967
中图分类号
R8 [特种医学]; R445 [影像诊断学];
学科分类号
1002 ; 100207 ; 1009 ;
摘要
PurposeMachine learning classification algorithms (classifiers) for prediction of treatment response are becoming more popular in radiotherapy literature. General Machine learning literature provides evidence in favor of some classifier families (random forest, support vector machine, gradient boosting) in terms of classification performance. The purpose of this study is to compare such classifiers specifically for (chemo)radiotherapy datasets and to estimate their average discriminative performance for radiation treatment outcome prediction. MethodsWe collected 12 datasets (3496 patients) from prior studies on post-(chemo)radiotherapy toxicity, survival, or tumor control with clinical, dosimetric, or blood biomarker features from multiple institutions and for different tumor sites, that is, (non-)small-cell lung cancer, head and neck cancer, and meningioma. Six common classification algorithms with built-in feature selection (decision tree, random forest, neural network, support vector machine, elastic net logistic regression, LogitBoost) were applied on each dataset using the popular open-source R package caret. The R code and documentation for the analysis are available online (). All classifiers were run on each dataset in a 100-repeated nested fivefold cross-validation with hyperparameter tuning. Performance metrics (AUC, calibration slope and intercept, accuracy, Cohen's kappa, and Brier score) were computed. We ranked classifiers by AUC to determine which classifier is likely to also perform well in future studies. We simulated the benefit for potential investigators to select a certain classifier for a new dataset based on our study (pre-selection based on other datasets) or estimating the best classifier for a dataset (set-specific selection based on information from the new dataset) compared with uninformed classifier selection (random selection). ResultsRandom forest (best in 6/12 datasets) and elastic net logistic regression (best in 4/12 datasets) showed the overall best discrimination, but there was no single best classifier across datasets. Both classifiers had a median AUC rank of 2. Preselection and set-specific selection yielded a significant average AUC improvement of 0.02 and 0.02 over random selection with an average AUC rank improvement of 0.42 and 0.66, respectively. ConclusionRandom forest and elastic net logistic regression yield higher discriminative performance in (chemo)radiotherapy outcome and toxicity prediction than other studied classifiers. Thus, one of these two classifiers should be the first choice for investigators when building classification models or to benchmark one's own modeling results against. Our results also show that an informed preselection of classifiers based on existing datasets can improve discrimination over random selection.
引用
收藏
页码:3449 / 3459
页数:11
相关论文
共 34 条
[1]   Acute esophageal toxicity in non-small cell lung cancer patients after high dose conformal radiotherapy [J].
Belderbos, J ;
Heemsbergen, W ;
Hoogeman, M ;
Pengel, K ;
Rossi, M ;
Lebesque, J .
RADIOTHERAPY AND ONCOLOGY, 2005, 75 (02) :157-164
[2]   Reirradiation of head and neck cancer: Long-term disease control and toxicity [J].
Bots, Wouter T. C. ;
van den Bosch, Sven ;
Zwijnenburg, Ellen M. ;
Dijkema, Tim ;
van den Broek, Guido B. ;
Weijs, Willem L. J. ;
Verhoef, Lia C. G. ;
Kaanders, Johannes H. A. M. .
HEAD AND NECK-JOURNAL FOR THE SCIENCES AND SPECIALTIES OF THE HEAD AND NECK, 2017, 39 (06) :1122-1130
[3]   Random forests [J].
Breiman, L .
MACHINE LEARNING, 2001, 45 (01) :5-32
[4]   Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission [J].
Caruana, Rich ;
Lou, Yin ;
Gehrke, Johannes ;
Koch, Paul ;
Sturm, Marc ;
Elhadad, Noemie .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :1721-1730
[5]  
Carvalho S, 2016, DATA PROGNOSTIC VALU, DOI [10. 17195/candat. 2016. 04. 1, DOI 10.17195/CANDAT.2016.04.1]
[6]   Prognostic value of blood-biomarkers related to hypoxia, inflammation, immune response and tumour load in non-small cell lung cancer - A survival model with external validation [J].
Carvalho, Sara ;
Troost, Esther G. C. ;
Bons, Judith ;
Menheere, Paul ;
Lambin, Philippe ;
Oberije, Cary .
RADIOTHERAPY AND ONCOLOGY, 2016, 119 (03) :487-494
[7]  
Deist TM, CODE MACHINE LEARNIN
[8]   Development and validation of a nomogram for prediction of survival and local control in laryngeal carcinoma patients treated with radiotherapy alone: A cohort study based on 994 patients [J].
Egelmeer, Ada G. T. M. ;
Velazquez, E. Rios ;
de Jong, Jos M. A. ;
Oberije, Cary ;
Geussens, Yasmyne ;
Nuyts, Sandra ;
Kremer, Bernd ;
Rietveld, Derek ;
Leemans, C. Rene ;
de Jong, Monique C. ;
Rasch, Coen ;
Hoebers, Frank ;
Homer, Jarrod ;
Slevin, Nick ;
West, Catharine ;
Lambin, Philippe .
RADIOTHERAPY AND ONCOLOGY, 2011, 100 (01) :108-115
[9]  
Fernández-Delgado M, 2014, J MACH LEARN RES, V15, P3133
[10]   Regularization Paths for Generalized Linear Models via Coordinate Descent [J].
Friedman, Jerome ;
Hastie, Trevor ;
Tibshirani, Rob .
JOURNAL OF STATISTICAL SOFTWARE, 2010, 33 (01) :1-22