Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research

被引:15
作者
Yagin, Burak [1 ]
Yagin, Fatma Hilal [1 ]
Colak, Cemil [1 ]
Inceoglu, Feyza [2 ]
Kadry, Seifedine [3 ,4 ,5 ]
Kim, Jungeun [6 ]
机构
[1] Inonu Univ, Fac Med, Dept Biostat & Med Informat, TR-44280 Malatya, Turkiye
[2] Malatya Turgut Ozal Univ, Fac Med, Dept Biostat, TR-44090 Malatya, Turkiye
[3] Noroff Univ Coll, Dept Appl Data Sci, N-4612 Kristiansand, Norway
[4] Ajman Univ, Artificial Intelligence Res Ctr AIRC, Ajman 346, U Arab Emirates
[5] Lebanese Amer Univ, Dept Elect & Comp Engn, Byblos 36, Lebanon
[6] Kongju Natl Univ, Dept Software, Cheonan 31080, South Korea
关键词
breast cancer metastasis; machine learning algorithms; genomic biomarkers; eXplainable artificial intelligence; SHAP; EXPRESSION; ASSOCIATION; PROGNOSIS;
D O I
10.3390/diagnostics13213314
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Aim: Method: This research presents a model combining machine learning (ML) techniques and eXplainable artificial intelligence (XAI) to predict breast cancer (BC) metastasis and reveal important genomic biomarkers in metastasis patients. Method: A total of 98 primary BC samples was analyzed, comprising 34 samples from patients who developed distant metastases within a 5-year follow-up period and 44 samples from patients who remained disease-free for at least 5 years after diagnosis. Genomic data were then subjected to biostatistical analysis, followed by the application of the elastic net feature selection method. This technique identified a restricted number of genomic biomarkers associated with BC metastasis. A light gradient boosting machine (LightGBM), categorical boosting (CatBoost), Extreme Gradient Boosting (XGBoost), Gradient Boosting Trees (GBT), and Ada boosting (AdaBoost) algorithms were utilized for prediction. To assess the models' predictive abilities, the accuracy, F1 score, precision, recall, area under the ROC curve (AUC), and Brier score were calculated as performance evaluation metrics. To promote interpretability and overcome the "black box" problem of ML models, a SHapley Additive exPlanations (SHAP) method was employed. Results: The LightGBM model outperformed other models, yielding remarkable accuracy of 96% and an AUC of 99.3%. In addition to biostatistical evaluation, in XAI-based SHAP results, increased expression levels of TSPYL5, ATP5E, CA9, NUP210, SLC37A1, ARIH1, PSMD7, UBQLN1, PRAME, and UBE2T (p <= 0.05) were found to be associated with an increased incidence of BC metastasis. Finally, decreased levels of expression of CACTIN, TGFB3, SCUBE2, ARL4D, OR1F1, ALDH4A1, PHF1, and CROCC (p <= 0.05) genes were also determined to increase the risk of metastasis in BC. Conclusion: The findings of this study may prevent disease progression and metastases and potentially improve clinical outcomes by recommending customized treatment approaches for BC patients.
引用
收藏
页数:12
相关论文
共 50 条
  • [31] Proposed Comprehensive Methodology Integrated with Explainable Artificial Intelligence for Prediction of Possible Biomarkers in Metabolomics Panel of Plasma Samples for Breast Cancer Detection
    Colak, Cemil
    Yagin, Fatma Hilal
    Algarni, Abdulmohsen
    Algarni, Ali
    Al-Hashem, Fahaid
    Ardigo, Luca Paolo
    MEDICINA-LITHUANIA, 2025, 61 (04):
  • [32] Machine Learning Techniques for Survival Time Prediction in Breast Cancer
    Mihaylov, Iliyan
    Nisheva, Maria
    Vassilev, Dimitar
    ARTIFICIAL INTELLIGENCE: METHODOLOGY, SYSTEMS, AND APPLICATIONS, AIMSA 2018, 2018, 11089 : 186 - 194
  • [33] Glycosylphosphatidylinositol anchor biosynthesis pathway-based biomarker identification with machine learning for prognosis and T cell exhaustion status prediction in breast cancer
    Wu, Haodong
    Wu, Zhixuan
    Li, Hongfeng
    Wang, Ziqiong
    Chen, Yao
    Bao, Jingxia
    Chen, Buran
    Xu, Shuning
    Xia, Erjie
    Ye, Daijiao
    Dai, Xuanxuan
    FRONTIERS IN IMMUNOLOGY, 2024, 15
  • [34] Prediction of Breast Cancer Using Simple Machine Learning Algorithms
    Devi, Seeta
    Dumbre, Dipali
    Chavan, Ranjana
    2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
  • [35] Explainable Artificial Intelligence to Detect Breast Cancer: A Qualitative Case-Based Visual Interpretability Approach
    Rodriguez-Sampaio, M.
    Rincon, M.
    Valladares-Rodriguez, S.
    Bachiller-Mayoral, M.
    ARTIFICIAL INTELLIGENCE IN NEUROSCIENCE: AFFECTIVE ANALYSIS AND HEALTH APPLICATIONS, PT I, 2022, 13258 : 557 - 566
  • [36] Robust edge-based biomarker discovery improves prediction of breast cancer metastasis
    Adnan, Nahim
    Lei, Chengwei
    Ruan, Jianhua
    BMC BIOINFORMATICS, 2020, 21 (Suppl 14)
  • [37] Explainable Artificial Intelligence for Prediction of Complete Surgical Cytoreduction in Advanced-Stage Epithelial Ovarian Cancer
    Laios, Alexandros
    Kalampokis, Evangelos
    Johnson, Racheal
    Thangavelu, Amudha
    Tarabanis, Constantine
    Nugent, David
    De Jong, Diederick
    JOURNAL OF PERSONALIZED MEDICINE, 2022, 12 (04):
  • [38] Robust edge-based biomarker discovery improves prediction of breast cancer metastasis
    Nahim Adnan
    Chengwei Lei
    Jianhua Ruan
    BMC Bioinformatics, 21
  • [39] COL11A1 as an novel biomarker for breast cancer with machine learning and immunohistochemistry validation
    Shi, Wenjie
    Chen, Zhilin
    Liu, Hui
    Miao, Chen
    Feng, Ruifa
    Wang, Guilin
    Chen, Guoping
    Chen, Zhitong
    Fan, Pingming
    Pang, Weiyi
    Li, Chen
    FRONTIERS IN IMMUNOLOGY, 2022, 13
  • [40] RMSxAI: arginine methylation sites prediction from protein sequences using machine learning algorithms and explainable artificial intelligence
    Dwivedi, Gaurav
    Khandelwal, Monika
    Rout, Ranjeet Kumar
    Umer, Saiyed
    Mallik, Saurav
    Qin, Hong
    DISCOVER APPLIED SCIENCES, 2024, 6 (07)