Comparative study of machine learning and statistical survival models for enhancing cervical cancer prognosis and risk factor assessment using SEER data

被引:0
作者
Kolasseri, Anjana Eledath [1 ]
Venkataramana, B. [1 ]
机构
[1] Vellore Inst Technol, Sch Adv Sci, Dept Math, Vellore, Tamil Nadu, India
来源
SCIENTIFIC REPORTS | 2024年 / 14卷 / 01期
关键词
Cervical cancer; Survival analysis; Machine learning; Statistical methods; Prognostic factors; REGRESSION TREES;
D O I
10.1038/s41598-024-72790-5
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Cervical cancer is a common malignant tumor of the female reproductive system and the leading cause of death among women worldwide. The survival prediction method can be used to effectively analyze the time to event, which is essential in any clinical study. This study aims to bridge the gap between traditional statistical methods and machine learning in survival analysis by revealing which techniques are most effective in predicting survival, with a particular emphasis on improving prediction accuracy and identifying key risk factors for cervical cancer. Women with cervical cancer diagnosed between 2013 and 2015 were included in our study using data from the Surveillance, Epidemiology, and End Results (SEER) database. Using this dataset, the study assesses the performance of Weibull, Cox proportional hazards models, and Random Survival Forests in terms of predictive accuracy and risk factor identification. The findings reveal that machine learning models, particularly Random Survival Forests (RSF), outperform traditional statistical methods in both predictive accuracy and the discernment of crucial prognostic factors, underscoring the advantages of machine learning in handling complex survival data. However, for a survival dataset with a small number of predictors, statistical models should be used first. The study finds that RSF models enhance survival analysis with more accurate predictions and insights into survival risk factors but highlights the need for larger datasets and further research on model interpretability and clinical applicability.
引用
收藏
页数:13
相关论文
共 53 条
  • [1] [Anonymous], 1988, IEEE Power Eng. Rev, V8, P20
  • [2] A survey of cross-validation procedures for model selection
    Arlot, Sylvain
    Celisse, Alain
    [J]. STATISTICS SURVEYS, 2010, 4 : 40 - 79
  • [3] Variables with time-varying effects and the Cox model: Some statistical concepts illustrated with a prognostic factor study in breast cancer
    Bellera, Carine A.
    MacGrogan, Gaetan
    Debled, Marc
    de lara, Christine Tunon
    Brouste, Veronique
    Mathoulin-Pelissier, Simone
    [J]. BMC MEDICAL RESEARCH METHODOLOGY, 2010, 10
  • [4] Bhargavi MV., 2023, Res. J. Pharm. Technol, DOI [10.52711/0974-360X.2023.00231, DOI 10.52711/0974-360X.2023.00231]
  • [5] Cancer of the cervix uteri: 2021 update
    Bhatla, Neerja
    Aoki, Daisuke
    Sharma, Daya Nand
    Sankaranarayanan, Rengaswamy
    [J]. INTERNATIONAL JOURNAL OF GYNECOLOGY & OBSTETRICS, 2021, 155 : 28 - 44
  • [6] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [7] Breiman L., WALD LECT
  • [8] Collett D., 2015, Modelling survival data in medical research, V3, DOI DOI 10.1201/B18041
  • [9] COX DR, 1972, J R STAT SOC B, V34, P187
  • [10] A semi-parametric generalization of the Cox proportional hazards regression model: Inference and applications
    Devarajan, Karthik
    Ebrahimi, Nader
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2011, 55 (01) : 667 - 676