Predicting second breast cancer among women with primary breast cancer using machine learning algorithms, a population-based observational study

被引:3
|
作者
Syleouni, Maria-Eleni [1 ,2 ]
Karavasiloglou, Nena [1 ,3 ]
Manduchi, Laura [4 ]
Wanner, Miriam [2 ]
Korol, Dimitri [2 ]
Ortelli, Laura [5 ]
Bordoni, Andrea [5 ]
Rohrmann, Sabine [1 ,2 ,6 ]
机构
[1] Univ Zurich, Epidemiol Biostat & Prevent Inst, Div Chron Dis Epidemiol, Zurich, Switzerland
[2] Univ Hosp Zurich, Canc Registry Zurich Zug Schaffhausen & Schwyz, Zurich, Switzerland
[3] European Food Safety Author, Parma, Italy
[4] Swiss Fed Inst Technol, Med Data Sci, Zurich, Switzerland
[5] Ticino Canc Registry, Publ Hlth Div Canton Ticino, Locarno, Switzerland
[6] Univ Zurich, Epidemiol Biostat & Prevent Inst, Hirschengraben 84, CH-8001 Zurich, Switzerland
关键词
breast cancer; cancer registry; machine learning; prediction; second cancer; RISK-FACTORS; LOCAL RECURRENCE; PROGNOSIS;
D O I
10.1002/ijc.34568
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Breast cancer survivors often experience recurrence or a second primary cancer. We developed an automated approach to predict the occurrence of any second breast cancer (SBC) using patient-level data and explored the generalizability of the models with an external validation data source. Breast cancer patients from the cancer registry of Zurich, Zug, Schaffhausen, Schwyz (N = 3213; training dataset) and the cancer registry of Ticino (N = 1073; external validation dataset), diagnosed between 2010 and 2018, were used for model training and validation, respectively. Machine learning (ML) methods, namely a feed-forward neural network (ANN), logistic regression, and extreme gradient boosting (XGB) were employed for classification. The best-performing model was selected based on the receiver operating characteristic (ROC) curve. Key characteristics contributing to a high SBC risk were identified. SBC was diagnosed in 6% of all cases. The most important features for SBC prediction were age at incidence, year of birth, stage, and extent of the pathological primary tumor. The ANN model had the highest area under the ROC curve with 0.78 (95% confidence interval [CI] 0.750.82) in the training data and 0.70 (95% CI 0.61-0.79) in the external validation data. Investigating the generalizability of different ML algorithms, we found that the ANN generalized better than the other models on the external validation data. This research is a first step towards the development of an automated tool that could assist clinicians in the identification of women at high risk of developing an SBC and potentially preventing it.
引用
收藏
页码:932 / 941
页数:10
相关论文
共 50 条
  • [21] A population-based study of breast cancer prevalence in Australia: predicting the future health care needs of women living with breast cancer
    Xue Qin Yu
    Roberta De Angelis
    Qingwei Luo
    Clare Kahn
    Nehmat Houssami
    Dianne L O’Connell
    BMC Cancer, 14
  • [22] Breast cancer risk factors and second primary malignancies among women with breast cancer
    Amy Trentham-Dietz
    Polly A. Newcomb
    Hazel B. Nichols
    John M. Hampton
    Breast Cancer Research and Treatment, 2007, 105 : 195 - 207
  • [23] Risk of Second Non-Breast Primary Cancer in Male and Female Breast Cancer Patients: A Population-Based Cohort Study
    Hung, Man-Hsin
    Liu, Chia-Jen
    Teng, Chung-Jen
    Hu, Yu-Wen
    Yeh, Chiu-Mei
    Chen, San-Chi
    Chien, Sheng-Hsuan
    Hung, Yi-Ping
    Shen, Cheng-Che
    Chen, Tzeng-Ji
    Tzeng, Cheng-Hwai
    Liu, Chun-Yu
    PLOS ONE, 2016, 11 (02):
  • [24] Smoking and the Risk of Second Primary Lung Cancer Among Breast Cancer Survivors from the Population-Based UK Biobank Study
    Graber-Naidich, Anna
    Choi, Eunji
    Wu, Julie T.
    Ellis-Caleo, Timothy J.
    Neal, Joel
    Wakelee, Heather A.
    Kurian, Allison W.
    Han, Summer S.
    CLINICAL LUNG CANCER, 2024, 25 (08)
  • [25] Predicting breast cancer survivability based on machine learning and features selection algorithms: a comparative study
    El Rahman, Sahar A.
    JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2021, 12 (08) : 8585 - 8623
  • [26] Predicting breast cancer survivability based on machine learning and features selection algorithms: a comparative study
    Sahar A. El_Rahman
    Journal of Ambient Intelligence and Humanized Computing, 2021, 12 : 8585 - 8623
  • [27] Epidemiology of Second Non-breast Primary Cancers among Survivors of Breast Cancer: A Korean Population-Based Study by the SMARTSHIP Group
    Kim, Haeyoung
    Kim, Su SSan
    Lee, Ji Sung
    Yoon, Jae Sun
    Youn, Hyun Jo
    Shin, Hyukjai
    Lee, Jeong Eon
    Lee, Se Kyung
    Chung, Il Yong
    Jung, So-Youn
    Choi, Young Jin
    Cho, Jihyoung
    Woo, Sang Uk
    CANCER RESEARCH AND TREATMENT, 2023, 55 (02): : 580 - 591
  • [28] Prognosis of breast cancer in young women: a population-based study
    Rapiti, E.
    Fioretta, G.
    Verkooijen, H.
    Schafer, P.
    Neyroud-Caspar, I.
    Vlastos, G.
    Sappino, A. P.
    Bouchardy, C.
    EJC SUPPLEMENTS, 2004, 2 (03): : 185 - 185
  • [29] Fracture risk in women with breast cancer: A population-based study
    Melton, L. Joseph, III
    Hartmann, Lynn C.
    Achenbach, Sara J.
    Atkinson, Elizabeth J.
    Therneau, Terry M.
    Khosla, Sundeep
    JOURNAL OF BONE AND MINERAL RESEARCH, 2012, 27 (05) : 1196 - 1205
  • [30] Prediction of Breast Cancer using Machine Learning Algorithms
    Mangal, Anuj
    Jain, Vinod
    PROCEEDINGS OF THE 2021 FIFTH INTERNATIONAL CONFERENCE ON I-SMAC (IOT IN SOCIAL, MOBILE, ANALYTICS AND CLOUD) (I-SMAC 2021), 2021, : 464 - 466