Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study

被引:3
|
作者
Syleouni, Maria-Eleni [1 ,2 ]
Karavasiloglou, Nena [1 ,3 ]
Manduchi, Laura [4 ]
Wanner, Miriam [2 ]
Korol, Dimitri [2 ]
Ortelli, Laura [5 ]
Bordoni, Andrea [5 ]
Rohrmann, Sabine [1 ,2 ,6 ]
机构
[1] Univ Zurich, Epidemiol Biostat & Prevent Inst, Div Chron Dis Epidemiol, Zurich, Switzerland
[2] Univ Hosp Zurich, Canc Registry Zurich Zug Schaffhausen & Schwyz, Zurich, Switzerland
[3] European Food Safety Author, Parma, Italy
[4] Swiss Fed Inst Technol, Med Data Sci, Zurich, Switzerland
[5] Ticino Canc Registry, Publ Hlth Div Canton Ticino, Locarno, Switzerland
[6] Univ Zurich, Epidemiol Biostat & Prevent Inst, Hirschengraben 84, CH-8001 Zurich, Switzerland
关键词
breast cancer; cancer registry; machine learning; prediction; second cancer; RISK-FACTORS; LOCAL RECURRENCE; PROGNOSIS;
D O I
10.1002/ijc.34568
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Breast cancer survivors often experience recurrence or a second primary cancer. We developed an automated approach to predict the occurrence of any second breast cancer (SBC) using patient-level data and explored the generalizability of the models with an external validation data source. Breast cancer patients from the cancer registry of Zurich, Zug, Schaffhausen, Schwyz (N = 3213; training dataset) and the cancer registry of Ticino (N = 1073; external validation dataset), diagnosed between 2010 and 2018, were used for model training and validation, respectively. Machine learning (ML) methods, namely a feed-forward neural network (ANN), logistic regression, and extreme gradient boosting (XGB) were employed for classification. The best-performing model was selected based on the receiver operating characteristic (ROC) curve. Key characteristics contributing to a high SBC risk were identified. SBC was diagnosed in 6% of all cases. The most important features for SBC prediction were age at incidence, year of birth, stage, and extent of the pathological primary tumor. The ANN model had the highest area under the ROC curve with 0.78 (95% confidence interval [CI] 0.750.82) in the training data and 0.70 (95% CI 0.61-0.79) in the external validation data. Investigating the generalizability of different ML algorithms, we found that the ANN generalized better than the other models on the external validation data. This research is a first step towards the development of an automated tool that could assist clinicians in the identification of women at high risk of developing an SBC and potentially preventing it.
引用
收藏
页码:932 / 941
页数:10
相关论文
共 50 条
  • [21] Prediction of Breast Cancer Using Simple Machine Learning Algorithms
    Devi, Seeta
    Dumbre, Dipali
    Chavan, Ranjana
    2024 INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING, COMMUNICATION AND APPLIED INFORMATICS, ACCAI 2024, 2024,
  • [22] Epidemiology of Second Non-breast Primary Cancers among Survivors of Breast Cancer: A Korean Population-Based Study by the SMARTSHIP Group
    Kim, Haeyoung
    Kim, Su SSan
    Lee, Ji Sung
    Yoon, Jae Sun
    Youn, Hyun Jo
    Shin, Hyukjai
    Lee, Jeong Eon
    Lee, Se Kyung
    Chung, Il Yong
    Jung, So-Youn
    Choi, Young Jin
    Cho, Jihyoung
    Woo, Sang Uk
    CANCER RESEARCH AND TREATMENT, 2023, 55 (02): : 580 - 591
  • [23] Predicting breast cancer survivability based on machine learning and features selection algorithms: a comparative study
    El Rahman, Sahar A.
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (08) : 8585 - 8623
  • [24] Fracture risk in women with breast cancer: A population-based study
    Melton, L. Joseph, III
    Hartmann, Lynn C.
    Achenbach, Sara J.
    Atkinson, Elizabeth J.
    Therneau, Terry M.
    Khosla, Sundeep
    JOURNAL OF BONE AND MINERAL RESEARCH, 2012, 27 (05) : 1196 - 1205
  • [25] Occupation and breast cancer risk among Shanghai women in a population-based cohort study
    Ji, Bu-Tian
    Blair, Aaron
    Shu, Xiao-Ou
    Chow, Wong-Ho
    Hauptmann, Michael
    Dosemeci, Mutafa
    Yang, Gong
    Lubin, Jay
    Gao, Yu-Tang
    Rothman, Nathaniel
    Zheng, Wei
    AMERICAN JOURNAL OF INDUSTRIAL MEDICINE, 2008, 51 (02) : 100 - 110
  • [26] Breast Cancer Detection Using Machine Learning Algorithms
    Sharma, Shubham
    Aggarwal, Archit
    Choudhury, Tanupriya
    PROCEEDINGS OF THE 2018 INTERNATIONAL CONFERENCE ON COMPUTATIONAL TECHNIQUES, ELECTRONICS AND MECHANICAL SYSTEMS (CTEMS), 2018, : 114 - 118
  • [27] Comparative Study of Machine Learning Algorithms using a Breast Cancer Dataset
    El-Shair, Zaid A.
    Sanchez-Perez, Luis A.
    Rawashdeh, Samir A.
    2020 IEEE INTERNATIONAL CONFERENCE ON ELECTRO INFORMATION TECHNOLOGY (EIT), 2020, : 500 - 508
  • [28] Development and Validation of Nomograms for Predicting Overall and Breast Cancer-Specific Survival in Young Women with Breast Cancer: A Population-Based Study
    Gong, Yue
    Ji, Peng
    Sun, Wei
    Jiang, Yi-Zhou
    Hu, Xin
    Shao, Zhi-Ming
    TRANSLATIONAL ONCOLOGY, 2018, 11 (06): : 1334 - 1342
  • [29] Risk of Breast Cancer in Women with Mastitis: A Retrospective Population-Based Cohort Study
    Chen, Ying-Cheng
    Chan, Chi-Ho
    Lim, Yu-Bing
    Yang, Shun-Fa
    Yeh, Liang-Tsai
    Wang, Yu-Hsun
    Chou, Ming-Chih
    Yeh, Chao-Bin
    MEDICINA-LITHUANIA, 2020, 56 (08): : 1 - 9
  • [30] Nonmetastatic breast cancer patients subsequently developing second primary malignancy: A population-based study
    Bao, Shengnan
    Jiang, Mengping
    Wang, Xi
    Hua, Yijia
    Zeng, Tianyu
    Yang, Yiqi
    Yang, Fan
    Yan, Xueqi
    Sun, Chunxiao
    Yang, Mengzhu
    Fu, Ziyi
    Huang, Xiang
    Li, Jun
    Wu, Hao
    Li, Wei
    Tang, Jinhai
    Yin, Yongmei
    CANCER MEDICINE, 2021, 10 (23): : 8662 - 8672