Machine Learning-Based Prognostic Prediction Models of Non-Metastatic Colon Cancer: Analyses Based on Surveillance, Epidemiology and End Results Database and a Chinese Cohort

被引:4
作者
Tang, Mo [1 ]
Gao, Lihao [2 ]
He, Bin [1 ]
Yang, Yufei [1 ]
机构
[1] China Acad Chinese Med Sci, Oncol Dept, Xiyuan Hosp, Beijing, Peoples R China
[2] Baidu Inc, Smart City Business Unit, 51 Dezhen Rd, Beijing 100091, Peoples R China
来源
CANCER MANAGEMENT AND RESEARCH | 2022年 / 14卷
关键词
colon cancer; machine learning; extreme gradient boosting; prognostic; ARTIFICIAL-INTELLIGENCE; COLORECTAL-CANCER; SURVIVAL; REGRESSION; CLASSIFICATION; OUTCOMES; TOOL;
D O I
10.2147/CMAR.S340739
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Purpose: The present study aimed to develop prognostic prediction models based on machine learning (ML) for non-metastatic colon cancer (CRC), which can provide a precise quantitative risk assessment and serve as an assistive method for treatment strategy development. The possibility of improving prediction accuracy using nonlinear methods compared to linear methods was investigated. Patients and Methods: A cancer-specific survival (CSS) model constructed using logistic regression, extreme gradient boosting (XGBoost), and random forest algorithms was trained on the Surveillance, Epidemiology, and End Results datasets for 15,254 patients with nonmetastatic CRC (split into training [70%] and internal validation [30%] datasets) and externally validated with an outpatient cohort of 311 cases from Xiyuan Hospital in China. A Chinese cohort was also used to develop recurrence and metastasis (R&M) models for CRC patients. The experiments for each model were performed 100 times to obtain average scores and 95% confidence intervals. The model performance was evaluated using the area under the receiver operating characteristic curve (AUC) values. Results: The XGBoost approach showed the highest AUC values of 0.86 (0.84-0.88), 0.82 (0.81-0.83), and 0.81 (0.79-0.82) for one-, three-, and five-year CSS cohorts, respectively, along with a relatively high generalization ability. The XGBoost approach also performed best for the R&M model, with the AUC values of 0.71 (0.64-0.79), 0.79 (0.74-0.86), and 0.89 (0.82-0.95) for one-, three-, and five-year R&M cohorts, respectively. The rankings of predictor importance for the CSS and R&M models were different, and the higher model accuracy was associated with more prognostic predictors. Conclusion: Three different ML algorithms for developing prognostic prediction models for non-metastatic CRC were compared. The predictive performance results showed that the nonlinear XGBoost approach performed best, suggesting that it can be used for quantifying the prognostic risk. It was also demonstrated that the model performance can be improved when more prognostic predictors are considered.
引用
收藏
页码:25 / 35
页数:11
相关论文
共 50 条
  • [21] Prediction of irinotecan toxicity in metastatic colorectal cancer patients based on machine learning models with pharmacokinetic parameters
    Oyaga-Iriarte, Esther
    Insausti, Asier
    Sayar, Onintza
    Aldaz, Azucena
    JOURNAL OF PHARMACOLOGICAL SCIENCES, 2019, 140 (01) : 20 - 25
  • [22] A prognostic nomogram for distal bile duct cancer from Surveillance, Epidemiology, and End Results (SEER) database based on the STROBE compliant
    Zhao, Ye-Yu
    Chen, Si-Hai
    Wan, Qin-Si
    MEDICINE, 2019, 98 (46) : e17903
  • [23] Development and validation of a prognostic nomogram for predicting liver metastasis in thyroid cancer: a study based on the surveillance, epidemiology, and end results database
    Ruan, Cong
    Chen, Xiaogang
    COMPUTER METHODS IN BIOMECHANICS AND BIOMEDICAL ENGINEERING, 2024,
  • [24] Models for Predicting Early Death in Patients With Stage IV Esophageal Cancer: A Surveillance, Epidemiology, and End Results-Based Cohort Study
    Shi, Min
    Zhai, Guo-qing
    CANCER CONTROL, 2022, 29
  • [25] Deep learning models for predicting the survival of patients with hepatocellular carcinoma based on a surveillance, epidemiology, and end results (SEER) database analysis
    Wang, Shoucheng
    Shao, Mingyi
    Fu, Yu
    Zhao, Ruixia
    Xing, Yunfei
    Zhang, Liujie
    Xu, Yang
    SCIENTIFIC REPORTS, 2024, 14 (01):
  • [26] Prognostic Factors and Nomogram for Choroid Plexus Tumors: A Population-Based Retrospective Surveillance, Epidemiology, and End Results Database Analysis
    Bhutada, Abhishek S.
    Adhikari, Srijan
    Cuoco, Joshua A.
    In, Alexander
    Rogers, Cara M.
    Jane Jr, John A.
    Marvin, Eric A.
    CANCERS, 2024, 16 (03)
  • [27] Prognostic factors for sublingual gland carcinoma: a population-based Surveillance, Epidemiology and End Results database study
    Qin, Gang
    Wu, Lei
    Li, Chengxia
    Zhang, Qian
    An, Zhongjun
    Qin, Jianhua
    JOURNAL OF INTERNATIONAL MEDICAL RESEARCH, 2023, 51 (11)
  • [28] Risk, Predictive Factors, and Nomogram of Liver Metastatic Gastroesophageal Junction Cancer: A New Study Based on the Surveillance, Epidemiology, and End Results Database
    Tian, Chenrui
    Li, Yang
    Li, Min
    CUREUS JOURNAL OF MEDICAL SCIENCE, 2024, 16 (06)
  • [29] Prognostic factors of pancreatic tumors in children and adolescents: a population study based on the surveillance, epidemiology, and end results database
    Qi, Xianzhong
    Zhou, Bi
    Liang, Fuhua
    Wang, Xinxin
    BMC GASTROENTEROLOGY, 2024, 24 (01)
  • [30] Quantitative tumor heterogeneity MRI profiling improves machine learning-based prognostication in patients with metastatic colon cancer
    Daye, Dania
    Tabari, Azadeh
    Kim, Hyunji
    Chang, Ken
    Kamran, Sophia C.
    Hong, Theodore S.
    Kalpathy-Cramer, Jayashree
    Gee, Michael S.
    EUROPEAN RADIOLOGY, 2021, 31 (08) : 5759 - 5767