Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort

被引:4
作者
Tang, Mo [1 ]
Gao, Lihao [2 ]
He, Bin [1 ]
Yang, Yufei [1 ]
机构
[1] China Acad Chinese Med Sci, Oncol Dept, Xiyuan Hosp, Beijing, Peoples R China
[2] Baidu Inc, Smart City Business Unit, 51 Dezhen Rd, Beijing 100091, Peoples R China
来源
CANCER MANAGEMENT AND RESEARCH | 2022年 / 14卷
关键词
colon cancer; machine learning; extreme gradient boosting; prognostic; ARTIFICIAL-INTELLIGENCE; COLORECTAL-CANCER; SURVIVAL; REGRESSION; CLASSIFICATION; OUTCOMES; TOOL;
D O I
10.2147/CMAR.S340739
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Purpose: The present study aimed to develop prognostic prediction models based on machine learning (ML) for non-metastatic colon cancer (CRC), which can provide a precise quantitative risk assessment and serve as an assistive method for treatment strategy development. The possibility of improving prediction accuracy using nonlinear methods compared to linear methods was investigated. Patients and Methods: A cancer-specific survival (CSS) model constructed using logistic regression, extreme gradient boosting (XGBoost), and random forest algorithms was trained on the Surveillance, Epidemiology, and End Results datasets for 15,254 patients with nonmetastatic CRC (split into training [70%] and internal validation [30%] datasets) and externally validated with an outpatient cohort of 311 cases from Xiyuan Hospital in China. A Chinese cohort was also used to develop recurrence and metastasis (R&M) models for CRC patients. The experiments for each model were performed 100 times to obtain average scores and 95% confidence intervals. The model performance was evaluated using the area under the receiver operating characteristic curve (AUC) values. Results: The XGBoost approach showed the highest AUC values of 0.86 (0.84-0.88), 0.82 (0.81-0.83), and 0.81 (0.79-0.82) for one-, three-, and five-year CSS cohorts, respectively, along with a relatively high generalization ability. The XGBoost approach also performed best for the R&M model, with the AUC values of 0.71 (0.64-0.79), 0.79 (0.74-0.86), and 0.89 (0.82-0.95) for one-, three-, and five-year R&M cohorts, respectively. The rankings of predictor importance for the CSS and R&M models were different, and the higher model accuracy was associated with more prognostic predictors. Conclusion: Three different ML algorithms for developing prognostic prediction models for non-metastatic CRC were compared. The predictive performance results showed that the nonlinear XGBoost approach performed best, suggesting that it can be used for quantifying the prognostic risk. It was also demonstrated that the model performance can be improved when more prognostic predictors are considered.
引用
收藏
页码:25 / 35
页数:11
相关论文
共 50 条
  • [1] Analysis of prognostic factors of metastatic endometrial cancer based on surveillance, epidemiology, and end results database
    Zhang, Meng
    Li, Ruiping
    Zhang, Shan
    Xu, Xin
    Liao, Lixin
    Yang, Yan
    Guo, Yuzhen
    FRONTIERS IN SURGERY, 2023, 9
  • [2] Machine learning-based models for the prediction of breast cancer recurrence risk
    Zuo, Duo
    Yang, Lexin
    Jin, Yu
    Qi, Huan
    Liu, Yahui
    Ren, Li
    BMC MEDICAL INFORMATICS AND DECISION MAKING, 2023, 23 (01)
  • [3] Prediction for 2-year mortality of metastatic ovarian cancer patients based on surveillance, epidemiology, and end results database
    Wang, Yongxin
    Shan, Xue
    Dong, He
    Li, Man
    Yue, Ying
    FRONTIERS IN SURGERY, 2022, 9
  • [4] Machine learning-based prognostic and metastasis models of kidney cancer
    Zhang, Yuxiang
    Hong, Na
    Huang, Sida
    Wu, Jie
    Gao, Jianwei
    Xu, Zheng
    Zhang, Fubo
    Ma, Shaohui
    Liu, Ye
    Sun, Peiyuan
    Tang, Yanping
    Liu, Chun
    Shou, Jianzhong
    Chen, Meng
    CANCER INNOVATION, 2022, 1 (02): : 124 - 134
  • [5] Comparison of prognostic factors of esophageal cancer between a Chinese cohort and the Surveillance, Epidemiology, and End Results (SEER) database: a retrospective cohort study
    Hu, Bin
    Zhu, Yiyao
    Wu, Xiaobo
    JOURNAL OF GASTROINTESTINAL ONCOLOGY, 2022, 13 (02) : 527 - 538
  • [6] A machine learning clinic scoring system for hepatocellular carcinoma based on the Surveillance, Epidemiology, and End Results database
    Wu, Yueqing
    Zhuo, Chenyi
    Lu, Yuan
    Luo, Zongjiang
    Lu, Libai
    Wang, Jianchu
    Tang, Qianli
    Phipps, Meaghan M.
    Nahm, William J.
    Facciorusso, Antonio
    Ge, Bin
    JOURNAL OF GASTROINTESTINAL ONCOLOGY, 2024, 15 (03) : 1082 - 1100
  • [7] Prognostic factors of patients after liver cancer surgery Based on Surveillance, Epidemiology, and End Results database
    Liang, Fangfang
    Ma, Fuchao
    Zhong, Jincai
    MEDICINE, 2021, 100 (30) : E26694
  • [8] A standard mastectomy should not be the only recommended breast surgical treatment for non-metastatic inflammatory breast cancer: A large population-based study in the Surveillance, Epidemiology, and End Results database 18
    Chen, Hongliang
    Wu, Kejin
    Wang, Maoli
    Wang, Fuwen
    Zhang, Mingdi
    Zhang, Peng
    BREAST, 2017, 35 : 48 - 54
  • [9] Establishment and validation of a prognostic nomogram for postoperative patients with gastric cardia adenocarcinoma: A study based on the Surveillance, Epidemiology, and End Results database and a Chinese cohort
    Wang, Lei
    Ge, Jingjing
    Feng, Liwen
    Wang, Zehua
    Wang, Wenjia
    Han, Huiqiong
    Qin, Yanru
    CANCER MEDICINE, 2023, 12 (12): : 13111 - 13122
  • [10] Prognostic impact of tumor deposits on overall survival in colorectal cancer: Based on Surveillance, Epidemiology, and End Results database
    Wu, Wen-Xiao
    Zhang, Da-Kui
    Chen, Shao-Xuan
    Hou, Zhi-Yong
    Sun, Bai-Long
    Yao, Li
    Jie, Jian-Zheng
    WORLD JOURNAL OF GASTROINTESTINAL ONCOLOGY, 2022, 14 (09) : 1699 - 1710