Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer

被引:35
作者
Gong, Xian [1 ,2 ]
Zheng, Bin [1 ,2 ]
Xu, Guobing [1 ,2 ]
Chen, Hao [1 ,2 ]
Chen, Chun [1 ,2 ]
机构
[1] Fujian Med Univ Union Hosp, Dept Thorac Surg, 29 Xinquan Rd, Fuzhou 350001, Peoples R China
[2] Fujian Med Univ, Fujian Prov Univ, Key Lab Cardiothorac Surg, Fuzhou, Peoples R China
关键词
Esophageal cancer (EC); survival; machine learning (ML); Surveillance; Epidemiology; and End Results (SEER);
D O I
10.21037/jtd-21-1107
中图分类号
R56 [呼吸系及胸部疾病];
学科分类号
摘要
Background: Accurate prognostic estimation for esophageal cancer (EC) patients plays an important role in the process of clinical decision-making. The objective of this study was to develop an effective model to predict the 5-year survival status of EC patients using machine learning (ML) algorithms. Methods: We retrieved the information of patients diagnosed with EC between 2010 and 2015 from the Surveillance, Epidemiology, and End Results (SEER) Program, including 24 features. A total of 8 ML models were applied to the selected dataset to classify the EC patients in terms of 5-year survival status, including 3 newly developed gradient boosting models (GBM), XGBoost, CatBoost, and LightGBM, 2 commonly used tree-based models, gradient boosting decision trees (GBDT) and random forest (RF), and 3 other ML models, artificial neural networks (ANN), naive Bayes (NB), and support vector machines (SVM). A 5-fold cross-validation was used in model performance measurement. Results: After excluding records with missing data, the final study population comprised 10,588 patients. Feature selection was conducted based on the chi(2) test, however, the experiment results showed that the complete dataset provided better prediction of outcomes than the dataset with removal of non-significant features. Among the 8 models, XGBoost had the best performance [area under the receiver operating characteristic (ROC) curve (AUC): 0.852 for XGBoost, 0.849 for CatBoost, 0.850 for LightGBM, 0.846 for GBDT, 0.838 for RF, 0.844 for ANN, 0.833 for NB, and 0.789 for SVM]. The accuracy and logistic loss of XGBoost were 0.875 and 0.301, respectively, which were also the best performances. In the XGBoost model, the SHapley Additive exPlanations (SHAP) value was calculated and the result indicated that the four features: reason no cancer-directed surgery, Surg Prim Site, age, and stage group had the greatest impact on predicting the outcomes. Conclusions: The XGBoost model and the complete dataset can be used to construct an accurate prognostic model for patients diagnosed with EC which may be applicable in clinical practice in the future.
引用
收藏
页码:6240 / +
页数:13
相关论文
共 32 条
[1]   Clinical decision support algorithm based on machine learning to assess the clinical response to anti-programmed death-1 therapy in patients with non-small-cell lung cancer [J].
Ahn, Beung-Chul ;
So, Jea-Woo ;
Synn, Chun-Bong ;
Kim, Tae Hyung ;
Kim, Jae Hwan ;
Byeon, Yeongseon ;
Kim, Young Seob ;
Heo, Seong Gu ;
Yang, San-Duk ;
Yun, Mi Ran ;
Lim, Sangbin ;
Choi, Su-Jin ;
Lee, Wongeun ;
Kim, Dong Kwon ;
Lee, Eun Ji ;
Lee, Seul ;
Lee, Doo-Jae ;
Kim, Chang Gon ;
Lim, Sun Min ;
Hong, Min Hee ;
Cho, Byoung Chul ;
Pyo, Kyoung-Ho ;
Kim, Hye Ryun .
EUROPEAN JOURNAL OF CANCER, 2021, 153 :179-189
[2]  
Ahn CW, 2004, LECT NOTES COMPUT SC, V3102, P840
[3]   Optuna: A Next-generation Hyperparameter Optimization Framework [J].
Akiba, Takuya ;
Sano, Shotaro ;
Yanase, Toshihiko ;
Ohta, Takeru ;
Koyama, Masanori .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :2623-2631
[4]   Predicting the Future Burden of Esophageal Cancer by Histological Subtype: International Trends in Incidence up to 2030 [J].
Arnold, Melina ;
Laversanne, Mathieu ;
Brown, Linda Morris ;
Devesa, Susan S. ;
Bray, Freddie .
AMERICAN JOURNAL OF GASTROENTEROLOGY, 2017, 112 (08) :1247-1255
[5]   Is cardiac toxicity a relevant issue in the radiation treatment of esophageal cancer? [J].
Beukema, Jannet C. ;
van Luijk, Peter ;
Widder, Joachim ;
Langendijk, Johannes A. ;
Muijs, Christina T. .
RADIOTHERAPY AND ONCOLOGY, 2015, 114 (01) :85-90
[6]   Breast cancer diagnosis from histopathological images using textural features and CBIR [J].
Carvalho, Edson D. ;
Antonio, O. C. Filho ;
Silva, Romuere R., V ;
Araujo, Flavio H. D. ;
Diniz, Joao O. B. ;
Silva, Aristofanes C. ;
Paiva, Anselmo C. ;
Gattass, Marcelo .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2020, 105
[7]   Outcome of Patients with Esophageal Cancer: A Nationwide Analysis [J].
Chen, Miao-Fen ;
Yang, Yao-Hsu ;
Lai, Chia-Hsuan ;
Chen, Pau-Chung ;
Chen, Wen-Cheng .
ANNALS OF SURGICAL ONCOLOGY, 2013, 20 (09) :3023-3030
[8]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794
[9]   Predicting breast cancer survivability: a comparison of three data mining methods [J].
Delen, D ;
Walker, G ;
Kadam, A .
ARTIFICIAL INTELLIGENCE IN MEDICINE, 2005, 34 (02) :113-127
[10]  
FORMAN BD, 1983, ADOLESCENCE, V18, P573