Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort

被引:4
作者
Tang, Mo [1 ]
Gao, Lihao [2 ]
He, Bin [1 ]
Yang, Yufei [1 ]
机构
[1] China Acad Chinese Med Sci, Oncol Dept, Xiyuan Hosp, Beijing, Peoples R China
[2] Baidu Inc, Smart City Business Unit, 51 Dezhen Rd, Beijing 100091, Peoples R China
来源
CANCER MANAGEMENT AND RESEARCH | 2022年 / 14卷
关键词
colon cancer; machine learning; extreme gradient boosting; prognostic; ARTIFICIAL-INTELLIGENCE; COLORECTAL-CANCER; SURVIVAL; REGRESSION; CLASSIFICATION; OUTCOMES; TOOL;
D O I
10.2147/CMAR.S340739
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Purpose: The present study aimed to develop prognostic prediction models based on machine learning (ML) for non-metastatic colon cancer (CRC), which can provide a precise quantitative risk assessment and serve as an assistive method for treatment strategy development. The possibility of improving prediction accuracy using nonlinear methods compared to linear methods was investigated. Patients and Methods: A cancer-specific survival (CSS) model constructed using logistic regression, extreme gradient boosting (XGBoost), and random forest algorithms was trained on the Surveillance, Epidemiology, and End Results datasets for 15,254 patients with nonmetastatic CRC (split into training [70%] and internal validation [30%] datasets) and externally validated with an outpatient cohort of 311 cases from Xiyuan Hospital in China. A Chinese cohort was also used to develop recurrence and metastasis (R&M) models for CRC patients. The experiments for each model were performed 100 times to obtain average scores and 95% confidence intervals. The model performance was evaluated using the area under the receiver operating characteristic curve (AUC) values. Results: The XGBoost approach showed the highest AUC values of 0.86 (0.84-0.88), 0.82 (0.81-0.83), and 0.81 (0.79-0.82) for one-, three-, and five-year CSS cohorts, respectively, along with a relatively high generalization ability. The XGBoost approach also performed best for the R&M model, with the AUC values of 0.71 (0.64-0.79), 0.79 (0.74-0.86), and 0.89 (0.82-0.95) for one-, three-, and five-year R&M cohorts, respectively. The rankings of predictor importance for the CSS and R&M models were different, and the higher model accuracy was associated with more prognostic predictors. Conclusion: Three different ML algorithms for developing prognostic prediction models for non-metastatic CRC were compared. The predictive performance results showed that the nonlinear XGBoost approach performed best, suggesting that it can be used for quantifying the prognostic risk. It was also demonstrated that the model performance can be improved when more prognostic predictors are considered.
引用
收藏
页码:25 / 35
页数:11
相关论文
共 50 条
  • [41] Deep learning models for predicting the survival of patients with chondrosarcoma based on a surveillance, epidemiology, and end results analysis
    Yan, Lizhao
    Gao, Nan
    Ai, Fangxing
    Zhao, Yingsong
    Kang, Yu
    Chen, Jianghai
    Weng, Yuxiong
    FRONTIERS IN ONCOLOGY, 2022, 12
  • [42] Machine learning-based radiomics models for prediction of locoregional recurrence in patients with breast cancer
    Lee, Joongyo
    Yoo, Sang Kyun
    Kim, Kangpyo
    Lee, Byung Min
    Park, Vivian Youngjean
    Kim, Jin Sung
    Kim, Yong Bae
    ONCOLOGY LETTERS, 2023, 26 (04)
  • [43] Differential prognostic implications of gastric adenocarcinoma based on Lauren's classification: a Surveillance, Epidemiology, and End Results (SEER)-based cohort study
    Tang, Dehua
    Ni, Muhan
    Zhu, Hao
    Cao, Jun
    Zhou, Lin
    Shen, Shanshan
    Peng, Chunyan
    Lv, Ying
    Xu, Guifang
    Wang, Lei
    Zou, Xiaoping
    ANNALS OF TRANSLATIONAL MEDICINE, 2021, 9 (08)
  • [44] Prognostic evaluation of segmental ureterectomy combined with chemotherapy in high-grade non-metastatic ureteral cancer: a study based on the SEER database
    Xia, Yu
    Ma, Bin-Bin
    Li, Meng-Yun
    Liu, Xi
    Xu, Dan-Feng
    Huang, Tao
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [45] Individual risk and prognostic value prediction by machine learning for distant metastasis in pulmonary sarcomatoid carcinoma: a large cohort study based on the SEER database and the Chinese population
    Yi, Xinglin
    Xu, Wenhao
    Tang, Guihua
    Zhang, Lingye
    Wang, Kaishan
    Luo, Hu
    Zhou, Xiangdong
    FRONTIERS IN ONCOLOGY, 2023, 13
  • [46] A prognostic nomogram based on lymph node ratio for postoperative vulvar squamous cell carcinoma from the Surveillance, Epidemiology, and End Results database: a retrospective cohort study
    Lei, Lei
    Tan, Liao
    Zhao, Xingping
    Zeng, Fei
    Xu, Dabao
    ANNALS OF TRANSLATIONAL MEDICINE, 2020, 8 (21)
  • [47] Prognostic factors for patients with chondrosarcoma: A survival analysis based on the Surveillance, Epidemiology, and End Results (SEER) database (1973-2012)
    Nie Zhigang
    Qiang, Lu
    Hao, Peng
    JOURNAL OF BONE ONCOLOGY, 2018, 13 : 55 - 61
  • [48] Prognostic Factors and Nomogram for Malignant Brainstem Ependymoma: A Population-Based Retrospective Surveillance, Epidemiology, and End Results Database Analysis
    Ji, Xiaoyu
    Yang, Siyuan
    Cheng, Dejing
    Zhao, Wenbo
    Sun, Xuebo
    Su, Fang
    CANCER MEDICINE, 2025, 14 (02):
  • [49] Impact of Treatment Delay on the Prognosis of Patients with Ovarian Cancer: A Population-based Study Using the Surveillance, Epidemiology, and End Results Database
    Zhao, Jing
    Chen, Ruiying
    Zhang, Yanli
    Wang, Yu
    Zhu, Haiyan
    JOURNAL OF CANCER, 2024, 15 (02): : 473 - 483
  • [50] An Assessment of the Predictive Performance of Current Machine Learning-Based Breast Cancer Risk Prediction Models: Systematic Review
    Gao, Ying
    Li, Shu
    Jin, Yujing
    Zhou, Lengxiao
    Sun, Shaomei
    Xu, Xiaoqian
    Li, Shuqian
    Yang, Hongxi
    Zhang, Qing
    Wang, Yaogang
    JMIR PUBLIC HEALTH AND SURVEILLANCE, 2022, 8 (12):