Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort

被引:6
作者
Tang, Mo [1 ]
Gao, Lihao [2 ]
He, Bin [1 ]
Yang, Yufei [1 ]
机构
[1] China Acad Chinese Med Sci, Oncol Dept, Xiyuan Hosp, Beijing, Peoples R China
[2] Baidu Inc, Smart City Business Unit, 51 Dezhen Rd, Beijing 100091, Peoples R China
来源
CANCER MANAGEMENT AND RESEARCH | 2022年 / 14卷
关键词
colon cancer; machine learning; extreme gradient boosting; prognostic; ARTIFICIAL-INTELLIGENCE; COLORECTAL-CANCER; SURVIVAL; REGRESSION; CLASSIFICATION; OUTCOMES; TOOL;
D O I
10.2147/CMAR.S340739
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Purpose: The present study aimed to develop prognostic prediction models based on machine learning (ML) for non-metastatic colon cancer (CRC), which can provide a precise quantitative risk assessment and serve as an assistive method for treatment strategy development. The possibility of improving prediction accuracy using nonlinear methods compared to linear methods was investigated. Patients and Methods: A cancer-specific survival (CSS) model constructed using logistic regression, extreme gradient boosting (XGBoost), and random forest algorithms was trained on the Surveillance, Epidemiology, and End Results datasets for 15,254 patients with nonmetastatic CRC (split into training [70%] and internal validation [30%] datasets) and externally validated with an outpatient cohort of 311 cases from Xiyuan Hospital in China. A Chinese cohort was also used to develop recurrence and metastasis (R&M) models for CRC patients. The experiments for each model were performed 100 times to obtain average scores and 95% confidence intervals. The model performance was evaluated using the area under the receiver operating characteristic curve (AUC) values. Results: The XGBoost approach showed the highest AUC values of 0.86 (0.84-0.88), 0.82 (0.81-0.83), and 0.81 (0.79-0.82) for one-, three-, and five-year CSS cohorts, respectively, along with a relatively high generalization ability. The XGBoost approach also performed best for the R&M model, with the AUC values of 0.71 (0.64-0.79), 0.79 (0.74-0.86), and 0.89 (0.82-0.95) for one-, three-, and five-year R&M cohorts, respectively. The rankings of predictor importance for the CSS and R&M models were different, and the higher model accuracy was associated with more prognostic predictors. Conclusion: Three different ML algorithms for developing prognostic prediction models for non-metastatic CRC were compared. The predictive performance results showed that the nonlinear XGBoost approach performed best, suggesting that it can be used for quantifying the prognostic risk. It was also demonstrated that the model performance can be improved when more prognostic predictors are considered.
引用
收藏
页码:25 / 35
页数:11
相关论文
共 45 条
[1]   Machine learning of clinical variables and coronary artery calcium scoring for the prediction of obstructive coronary artery disease on coronary computed tomography angiography: analysis from the CONFIRM registry [J].
Al'Arefilb, Subhi J. ;
Maliakal, Gabriel ;
Singh, Gurpreet ;
van Rosendael, Alexander R. ;
Ma, Xiaoyue ;
Xu, Zhuoran ;
Alawamlh, Omar Al Hussein ;
Lee, Benjamin ;
Pandey, Mohit ;
Achenbach, Stephan ;
Al-Mallah, Mouaz H. ;
Andreini, Daniele ;
Bax, Jeroen J. ;
Berman, Daniel S. ;
Budoff, Matthew J. ;
Cademartiri, Filippo ;
Canister, Tracy Q. ;
Chang, Hyuk-Jae ;
Chinnaiyan, Kavitha ;
Chow, Benjamin J. W. ;
Cury, Ricardo C. ;
DeLago, Augustin ;
Feuchtner, Gudrun ;
Hadamitzky, Martin ;
Hausleiter, Joerg ;
Kaufmann, Philipp A. ;
Kim, Yong-Jin ;
Leipsic, Jonathon A. ;
Maffei, Erica ;
Marques, Hugo ;
Goncalves, Pedro de Araujo ;
Pontone, Gianluca ;
Raff, Gilbert L. ;
Rubinshtein, Ronen ;
Villines, Todd C. ;
Gransar, Heidi ;
Lu, Yao ;
Jones, Erica C. ;
Pena, Jessica M. ;
Lin, Fay Y. ;
Min, James K. ;
Shaw, Leslee J. .
EUROPEAN HEART JOURNAL, 2020, 41 (03) :359-367
[2]   Discrimination and Calibration of Clinical Prediction Models Users' Guides to the Medical Literature [J].
Alba, Ana Carolina ;
Agoritsas, Thomas ;
Walsh, Michael ;
Hanna, Steven ;
Iorio, Alfonso ;
Devereaux, P. J. ;
McGinn, Thomas ;
Guyatt, Gordon .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 2017, 318 (14) :1377-1384
[3]   Prognosis and prognostic research: validating a prognostic model [J].
Altman, Douglas G. ;
Vergouwe, Yvonne ;
Royston, Patrick ;
Moons, Karel G. M. .
BMJ-BRITISH MEDICAL JOURNAL, 2009, 338 :1432-1435
[4]   Reporting and Implementing Interventions Involving Machine Learning and Artificial Intelligence [J].
Bates, David W. ;
Auerbach, Andrew ;
Schulam, Peter ;
Wright, Adam ;
Saria, Suchi .
ANNALS OF INTERNAL MEDICINE, 2020, 172 :S137-S144
[5]   Evaluating the Prognostic Role of Elevated Preoperative Carcinoembryonic Antigen Levels in Colon Cancer Patients: Results from the National Cancer Database [J].
Becerra, Adan Z. ;
Probst, Christian P. ;
Tejani, Mohamedtaki A. ;
Aquina, Christopher T. ;
Gonzalez, Maynor G. ;
Hensley, Bradley J. ;
Noyes, Katia ;
Monson, John R. ;
Fleming, Fergal J. .
ANNALS OF SURGICAL ONCOLOGY, 2016, 23 (05) :1554-1561
[6]   X-tile: A new bio-informatics tool for biomarker assessment and outcome-based cut-point optimization [J].
Camp, RL ;
Dolled-Filhart, M ;
Rimm, DL .
CLINICAL CANCER RESEARCH, 2004, 10 (21) :7252-7259
[7]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[8]   Time-dependent and nonlinear effects of prognostic factors in nonmetastatic colorectal cancer [J].
Chi, Sheng-Qiang ;
Tian, Yu ;
Li, Jun ;
Tong, Dan-yang ;
Kong, Xiang-Xing ;
Poston, Graeme ;
Ding, Ke-Feng ;
Li, Jing-Song .
CANCER MEDICINE, 2017, 6 (08) :1882-1892
[9]   External validation of multivariable prediction models: a systematic review of methodological conduct and reporting [J].
Collins, Gary S. ;
de Groot, Joris A. ;
Dutton, Susan ;
Omar, Omar ;
Shanyinde, Milensu ;
Tajar, Abdelouahid ;
Voysey, Merryn ;
Wharton, Rose ;
Yu, Ly-Mee ;
Moons, Karel G. ;
Altman, Douglas G. .
BMC MEDICAL RESEARCH METHODOLOGY, 2014, 14
[10]  
D'Ascenzo F, 2021, LANCET, V397, P199, DOI 10.1016/S0140-6736(20)32519-8