Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort

被引:4
作者
Tang, Mo [1 ]
Gao, Lihao [2 ]
He, Bin [1 ]
Yang, Yufei [1 ]
机构
[1] China Acad Chinese Med Sci, Oncol Dept, Xiyuan Hosp, Beijing, Peoples R China
[2] Baidu Inc, Smart City Business Unit, 51 Dezhen Rd, Beijing 100091, Peoples R China
来源
CANCER MANAGEMENT AND RESEARCH | 2022年 / 14卷
关键词
colon cancer; machine learning; extreme gradient boosting; prognostic; ARTIFICIAL-INTELLIGENCE; COLORECTAL-CANCER; SURVIVAL; REGRESSION; CLASSIFICATION; OUTCOMES; TOOL;
D O I
10.2147/CMAR.S340739
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Purpose: The present study aimed to develop prognostic prediction models based on machine learning (ML) for non-metastatic colon cancer (CRC), which can provide a precise quantitative risk assessment and serve as an assistive method for treatment strategy development. The possibility of improving prediction accuracy using nonlinear methods compared to linear methods was investigated. Patients and Methods: A cancer-specific survival (CSS) model constructed using logistic regression, extreme gradient boosting (XGBoost), and random forest algorithms was trained on the Surveillance, Epidemiology, and End Results datasets for 15,254 patients with nonmetastatic CRC (split into training [70%] and internal validation [30%] datasets) and externally validated with an outpatient cohort of 311 cases from Xiyuan Hospital in China. A Chinese cohort was also used to develop recurrence and metastasis (R&M) models for CRC patients. The experiments for each model were performed 100 times to obtain average scores and 95% confidence intervals. The model performance was evaluated using the area under the receiver operating characteristic curve (AUC) values. Results: The XGBoost approach showed the highest AUC values of 0.86 (0.84-0.88), 0.82 (0.81-0.83), and 0.81 (0.79-0.82) for one-, three-, and five-year CSS cohorts, respectively, along with a relatively high generalization ability. The XGBoost approach also performed best for the R&M model, with the AUC values of 0.71 (0.64-0.79), 0.79 (0.74-0.86), and 0.89 (0.82-0.95) for one-, three-, and five-year R&M cohorts, respectively. The rankings of predictor importance for the CSS and R&M models were different, and the higher model accuracy was associated with more prognostic predictors. Conclusion: Three different ML algorithms for developing prognostic prediction models for non-metastatic CRC were compared. The predictive performance results showed that the nonlinear XGBoost approach performed best, suggesting that it can be used for quantifying the prognostic risk. It was also demonstrated that the model performance can be improved when more prognostic predictors are considered.
引用
收藏
页码:25 / 35
页数:11
相关论文
共 50 条
  • [31] Prognostic factors of pancreatic tumors in children and adolescents: a population study based on the surveillance, epidemiology, and end results database
    Xianzhong Qi
    Bi Zhou
    Fuhua Liang
    Xinxin Wang
    BMC Gastroenterology, 24
  • [32] Prognostic impacts of extracranial metastasis on non-small cell lung cancer with brain metastasis: A retrospective study based on surveillance, epidemiology, and end results database
    Wang, Miao
    Wu, Qiuji
    Zhang, Jun
    Qin, Guizhen
    Yang, Tian
    Liu, Yixin
    Wang, Xulong
    Zhang, Boyu
    Wei, Yongchang
    CANCER MEDICINE, 2021, 10 (02): : 471 - 482
  • [33] The Prevalence and Death Risk of Male Breast Cancer: A Study Based on the Surveillance, Epidemiology, and End Results Database
    Cui, Xiaofei
    AMERICAN JOURNAL OF MENS HEALTH, 2022, 16 (01)
  • [34] Epidemiology of malignant cutaneous granular cell tumors: A US population-based cohort analysis using the Surveillance, Epidemiology, and End Results (SEER) database
    Mirza, Fatima N.
    Tuggle, Charles T.
    Zogg, Cheryl K.
    Mirza, Humza N.
    Narayan, Deepak
    JOURNAL OF THE AMERICAN ACADEMY OF DERMATOLOGY, 2018, 78 (03) : 490 - +
  • [35] Creation of a machine learning-based prognostic prediction model for various subtypes of laryngeal cancer
    Wang, Wei
    Wang, Wenhui
    Zhang, Dongdong
    Zeng, Peiji
    Wang, Yue
    Lei, Min
    Hong, Yongjun
    Cai, Chengfu
    SCIENTIFIC REPORTS, 2024, 14 (01)
  • [36] Machine learning-based models for advanced fibrosis in non-alcoholic steatohepatitis patients: A cohort study
    Xiong, Fei-Xiang
    Sun, Lei
    Zhang, Xue-Jie
    Chen, Jia-Liang
    Zhou, Yang
    Ji, Xiao-Min
    Meng, Pei-Pei
    Wu, Tong
    Wang, Xian-Bo
    Hou, Yi-Xin
    WORLD JOURNAL OF GASTROENTEROLOGY, 2025, 31 (09)
  • [37] Trajectory of breastfeeding among Chinese women and risk prediction models based on machine learning: a cohort study
    Liu, Yi
    Xiang, Jie
    Yan, Ping
    Liu, Yuanqiong
    Chen, Peng
    Song, Yujia
    Ren, Jianhua
    BMC PREGNANCY AND CHILDBIRTH, 2024, 24 (01)
  • [38] Risk and prognostic factors of brain metastasis in lung cancer patients: a Surveillance, Epidemiology, and End Results population-based cohort study
    Hao, Yongping
    Li, Guang
    EUROPEAN JOURNAL OF CANCER PREVENTION, 2023, 32 (05) : 498 - 511
  • [39] Machine learning-based radiomics score improves prognostic prediction accuracy of stage II/III gastric cancer: A multi-cohort study
    Xiang, Ying-Hao
    Mou, Huan
    Qu, Bo
    Sun, Hui-Rong
    WORLD JOURNAL OF GASTROINTESTINAL SURGERY, 2024, 16 (02):
  • [40] Deep learning models for predicting the survival of patients with medulloblastoma based on a surveillance, epidemiology, and end results analysis
    Sun, Meng
    Sun, Jikui
    Li, Meng
    SCIENTIFIC REPORTS, 2024, 14 (01):