Mortality Prediction Modeling for Patients with Breast Cancer Based on Explainable Machine Learning

被引:2
作者
Park, Sang Won [1 ,2 ]
Park, Ye-Lin [3 ]
Lee, Eun-Gyeong [4 ]
Chae, Heejung [3 ,5 ]
Park, Phillip [3 ]
Choi, Dong-Woo [3 ]
Choi, Yeon Ho [3 ]
Hwang, Juyeon [3 ]
Ahn, Seohyun [3 ]
Kim, Keunkyun [3 ]
Kim, Woo Jin [1 ,6 ,7 ]
Kong, Sun-Young [8 ,9 ]
Jung, So-Youn [4 ]
Kim, Hyun-Jin [3 ]
机构
[1] Kangwon Natl Univ, Sch Med, Dept Med Informat, Chunchon 24341, South Korea
[2] Kangwon Natl Univ, Inst Med Sci, Sch Med, Chunchon 24341, South Korea
[3] Natl Canc Ctr, Natl Canc Control Inst, Canc Data Ctr, Goyang 10408, South Korea
[4] Natl Canc Ctr, Ctr Breast Canc, Dept Surg, Goyang 10408, South Korea
[5] Natl Canc Ctr, Ctr Breast Canc, Dept Med Oncol, Goyang 10408, South Korea
[6] Kangwon Natl Univ Hosp, Dept Internal Med, Chunchon 24289, South Korea
[7] Kangwon Natl Univ, Sch Med, Dept Internal Med, Chunchon 24341, South Korea
[8] Natl Canc Ctr, Res Inst, Targeted Therapy Branch, Goyang 10408, South Korea
[9] Natl Canc Ctr, Hospital, Dept Lab Med, Goyang 10408, South Korea
关键词
breast cancer; artificial intelligence; machine learning; explainable artificial intelligence; mortality; SURVIVAL; DIAGNOSIS; RECORDS;
D O I
10.3390/cancers16223799
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Background/Objectives: Breast cancer is the most common cancer in women worldwide, requiring strategic efforts to reduce its mortality. This study aimed to develop a predictive classification model for breast cancer mortality using real-world data, including various clinical features. Methods: A total of 11,286 patients with breast cancer from the National Cancer Center were included in this study. The mortality rate of the total sample was approximately 6.2%. Propensity score matching was used to reduce bias. Several machine learning models, including extreme gradient boosting, were applied to 31 clinical features. To enhance model interpretability, we used the SHapley Additive exPlanations method. ML analyses were also performed on the samples, excluding patients who developed other cancers after breast cancer. Results: Among the ML models, the XGB model exhibited the highest discriminatory power, with an area under the curve of 0.8722 and a specificity of 0.9472. Key predictors of the mortality classification model included occurrence in other organs, age at diagnosis, N stage, T stage, curative radiation treatment, and Ki-67(%). Even after excluding patients who developed other cancers after breast cancer, the XGB model remained the best-performing, with an AUC of 0.8518 and a specificity of 0.9766. Additionally, the top predictors from SHAP were similar to the results for the overall sample. Conclusions: Our models provided excellent predictions of breast cancer mortality using real-world data from South Korea. Explainable artificial intelligence, such as SHAP, validated the clinical applicability and interpretability of these models.
引用
收藏
页数:20
相关论文
共 69 条
[1]   Artificial Intelligence in Breast Cancer Diagnosis and Personalized Medicine [J].
Ahn, Jong Seok ;
Shin, Sangwon ;
Yang, Su-A ;
Park, Eun Kyung ;
Kim, Ki Hwan ;
Cho, Soo Ick ;
Ock, Chan-Young ;
Kim, Seokhwi .
JOURNAL OF BREAST CANCER, 2023, 26 (05) :405-435
[2]  
Allugunti VR, 2022, International Journal of Engineering in Computer Science, V4, P49, DOI [10.33545/26633582.2022.v4.i1a.68, 10.33545/26633582.2022.v4.i1a.68, DOI 10.33545/26633582.2022.V4.I1A.68]
[3]   Comparing Prognosis for BRCA1, BRCA2, and Non-BRCA Breast Cancer [J].
Antunes Meireles, Pedro ;
Fragoso, Sofia ;
Duarte, Teresa ;
Santos, Sidonia ;
Bexiga, Catarina ;
Nejo, Priscila ;
Luis, Ana ;
Mira, Beatriz ;
Miguel, Isalia ;
Rodrigues, Paula ;
Vaz, Fatima .
CANCERS, 2023, 15 (23)
[4]   Current and future burden of breast cancer: Global statistics for 2020 and 2040 [J].
Arnold, Melina ;
Morgan, Eileen ;
Rumgay, Harriet ;
Mafra, Allini ;
Singh, Deependra ;
Laversanne, Mathieu ;
Vignat, Jerome ;
Gralow, Julie R. ;
Cardoso, Fatima ;
Siesling, Sabine ;
Soerjomataram, Isabelle .
BREAST, 2022, 66 :15-23
[5]   An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies [J].
Austin, Peter C. .
MULTIVARIATE BEHAVIORAL RESEARCH, 2011, 46 (03) :399-424
[6]   Application of machine learning techniques for predicting survival in ovarian cancer [J].
Azar, Amir Sorayaie ;
Rikan, Samin Babaei ;
Naemi, Amin ;
Mohasefi, Jamshid Bagherzadeh ;
Pirnejad, Habibollah ;
Mohasefi, Matin Bagherzadeh ;
Wiil, Uffe Kock .
BMC MEDICAL INFORMATICS AND DECISION MAKING, 2022, 22 (01)
[7]   SEDE-GPS: socio-economic data enrichment based on GPS information [J].
Sperlea, Theodor ;
Fueser, Stefan ;
Boenigk, Jens ;
Heider, Dominik .
BMC BIOINFORMATICS, 2018, 19
[8]   The Korea Cancer Big Data Platform (K-CBP) for Cancer Research [J].
Cha, Hyo Soung ;
Jung, Jip Min ;
Shin, Seob Yoon ;
Jang, Young Mi ;
Park, Phillip ;
Lee, Jae Wook ;
Chung, Seung Hyun ;
Choi, Kui Son .
INTERNATIONAL JOURNAL OF ENVIRONMENTAL RESEARCH AND PUBLIC HEALTH, 2019, 16 (13)
[9]  
Chen Hua, 2023, Comput Intell Neurosci, V2023, P6530719, DOI [10.1155/2023/6530719, 10.1155/2023/6530719]
[10]   XGBoost: A Scalable Tree Boosting System [J].
Chen, Tianqi ;
Guestrin, Carlos .
KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, :785-794