Mortality Prediction Modeling for Patients with Breast Cancer Based on Explainable Machine Learning

被引：2

作者：

Park, Sang Won ^{[1
,2
]}

Park, Ye-Lin ^{[3
]}

Lee, Eun-Gyeong ^{[4
]}

Chae, Heejung ^{[3
,5
]}

Park, Phillip ^{[3
]}

Choi, Dong-Woo ^{[3
]}

Choi, Yeon Ho ^{[3
]}

Hwang, Juyeon ^{[3
]}

Ahn, Seohyun ^{[3
]}

Kim, Keunkyun ^{[3
]}

Kim, Woo Jin ^{[1
,6
,7
]}

Kong, Sun-Young ^{[8
,9
]}

Jung, So-Youn ^{[4
]}

Kim, Hyun-Jin ^{[3
]}

机构：

[1] Kangwon Natl Univ, Sch Med, Dept Med Informat, Chunchon 24341, South Korea

[2] Kangwon Natl Univ, Inst Med Sci, Sch Med, Chunchon 24341, South Korea

[3] Natl Canc Ctr, Natl Canc Control Inst, Canc Data Ctr, Goyang 10408, South Korea

[4] Natl Canc Ctr, Ctr Breast Canc, Dept Surg, Goyang 10408, South Korea

[5] Natl Canc Ctr, Ctr Breast Canc, Dept Med Oncol, Goyang 10408, South Korea

[6] Kangwon Natl Univ Hosp, Dept Internal Med, Chunchon 24289, South Korea

[7] Kangwon Natl Univ, Sch Med, Dept Internal Med, Chunchon 24341, South Korea

[8] Natl Canc Ctr, Res Inst, Targeted Therapy Branch, Goyang 10408, South Korea

[9] Natl Canc Ctr, Hospital, Dept Lab Med, Goyang 10408, South Korea

来源：

CANCERS | 2024年 / 16卷 / 22期

关键词：

breast cancer; artificial intelligence; machine learning; explainable artificial intelligence; mortality; SURVIVAL; DIAGNOSIS; RECORDS;

D O I：

10.3390/cancers16223799

中图分类号：

R73 [肿瘤学];

学科分类号：

100214 ;

摘要：

Background/Objectives: Breast cancer is the most common cancer in women worldwide, requiring strategic efforts to reduce its mortality. This study aimed to develop a predictive classification model for breast cancer mortality using real-world data, including various clinical features. Methods: A total of 11,286 patients with breast cancer from the National Cancer Center were included in this study. The mortality rate of the total sample was approximately 6.2%. Propensity score matching was used to reduce bias. Several machine learning models, including extreme gradient boosting, were applied to 31 clinical features. To enhance model interpretability, we used the SHapley Additive exPlanations method. ML analyses were also performed on the samples, excluding patients who developed other cancers after breast cancer. Results: Among the ML models, the XGB model exhibited the highest discriminatory power, with an area under the curve of 0.8722 and a specificity of 0.9472. Key predictors of the mortality classification model included occurrence in other organs, age at diagnosis, N stage, T stage, curative radiation treatment, and Ki-67(%). Even after excluding patients who developed other cancers after breast cancer, the XGB model remained the best-performing, with an AUC of 0.8518 and a specificity of 0.9766. Additionally, the top predictors from SHAP were similar to the results for the overall sample. Conclusions: Our models provided excellent predictions of breast cancer mortality using real-world data from South Korea. Explainable artificial intelligence, such as SHAP, validated the clinical applicability and interpretability of these models.

引用

页数：20

共 69 条

[1] Artificial Intelligence in Breast Cancer Diagnosis and Personalized Medicine [J].

Ahn, Jong Seok ;

Shin, Sangwon ;

Yang, Su-A ;

Park, Eun Kyung ;

Kim, Ki Hwan ;

Cho, Soo Ick ;

Ock, Chan-Young ;

Kim, Seokhwi .

JOURNAL OF BREAST CANCER, 2023, 26 (05) :405-435

[2]

Allugunti VR, 2022, International Journal of Engineering in Computer Science, V4, P49, DOI [10.33545/26633582.2022.v4.i1a.68, 10.33545/26633582.2022.v4.i1a.68, DOI 10.33545/26633582.2022.V4.I1A.68]

[3] Comparing Prognosis for BRCA1, BRCA2, and Non-BRCA Breast Cancer [J].