A web-based tool for cancer risk prediction for middle-aged and elderly adults using machine learning algorithms and self-reported questions

被引:0
作者
Xiao, Xingjian [1 ]
Yi, Xiaohan [1 ]
Soe, Nyi Nyi [2 ,3 ]
Latt, Phyu Mon [2 ,3 ]
Lin, Luotao [4 ]
Chen, Xuefen [1 ]
Song, Hualing [1 ]
Sun, Bo [5 ]
Zhao, Hailei [1 ]
Xu, Xianglong [1 ,2 ,3 ,6 ,7 ]
机构
[1] Shanghai Univ Tradit Chinese Med, Sch Publ Hlth, Shanghai, Peoples R China
[2] Monash Univ, Fac Med Nursing & Hlth Sci, Sch Translat Med, Clayton, Vic, Australia
[3] Alfred Hlth, Melbourne Sexual Hlth Ctr, Artificial Intelligence & Modelling Epidemiol Prog, Carlton, Vic, Australia
[4] Univ New Mexico, Dept Individual Family & Community Educ, Nutr & Dietet Program, Albuquerque, NM USA
[5] Shanghai Univ Tradit Chinese Med, LongHua Hosp, Endoscopy Ctr, Shanghai, Peoples R China
[6] Shanghai Univ Tradit Chinese Med, Bijie Inst, Bijie, Peoples R China
[7] Bijie Dist Ctr Dis Control & Prevent, Doctoral Workstat, Bijie, Peoples R China
关键词
Cancer; Pan-cancer; Prediction; Web-based; Risk; Co-management; Co-prevention; Middle-aged; China; Machine learning; HEALTH;
D O I
10.1016/j.annepidem.2024.12.003
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Background: From a global perspective, China is one of the countries with higher incidence and mortality rates for cancer. Objective: Our objective is to create an online cancer risk prediction tool for middle-aged and elderly Chinese adults by leveraging machine learning algorithms and self-reported data. Method: Drawing from a cohort of 19,798 participants aged 45 and above from the China Health and Retirement Longitudinal Study (2011 - 2018), we employed nine machine learning algorithms (LR: Logistic Regression, Adaboost: Adaptive Boosting, SVM: Support Vector Machine, RF: Random Forest, GNB: Gaussian Naive Bayes, GBM: Gradient Boosting Machine, LGBM: Light Gradient Boosting Machine, XGBoost: eXtreme Gradient Boosting, KNN: K - Nearest Neighbors), which are mainly used for classification and regression tasks, to construct predictive models for various cancers. Utilizing non-invasive self-reported predictors encompassing demographic, educational, marital, lifestyle, health history, and other factors, we focused on predicting "Cancer or Malignant Tumour" outcomes. The types of cancers that can be predicted mainly include lung cancer, breast cancer, cervical cancer, colorectal cancer, gastric cancer, esophageal cancer, and other rare cancers. Results: The developed tool, MyCancerRisk, demonstrated significant performance, with the Random Forest algorithm achieving an AUC of 0.75 and ACC of 0.99 using self-reported variables. Key predictors identified include age, self-rated health, sleep patterns, household heating sources, childhood health status, living conditions, and smoking habits. Conclusion: MyCancerRisk aims to serve as a preventative screening tool, encouraging individuals to undergo testing and adopt healthier behaviours to mitigate the public health impact of cancer. Our study also sheds light on unconventional predictors, such as housing conditions, offering valuable insights for refining cancer prediction models.
引用
收藏
页码:27 / 35
页数:9
相关论文
共 27 条
  • [11] Prediction of lung metastases in thyroid cancer using machine learning based on SEER database
    Liu, Wenfei
    Wang, Shoufei
    Ye, Ziheng
    Xu, Peipei
    Xia, Xiaotian
    Guo, Minggao
    [J]. CANCER MEDICINE, 2022, 11 (12): : 2503 - 2515
  • [12] Association of Overweight, Obesity, and Recent Weight Loss With Colorectal Cancer Risk
    Mandic, Marko
    Safizadeh, Fatemeh
    Niedermaier, Tobias
    Hoffmeister, Michael
    Brenner, Hermann
    [J]. JAMA NETWORK OPEN, 2023, 6 (04)
  • [13] Current Cancer Epidemiology
    Mattiuzzi, Camilla
    Lippi, Giuseppe
    [J]. JOURNAL OF EPIDEMIOLOGY AND GLOBAL HEALTH, 2019, 9 (04) : 217 - 222
  • [14] DeepProg: an ensemble of deep-learning and machine-learning models for prognosis prediction using multi-omics data
    Poirion, Olivier B.
    Jing, Zheng
    Chaudhary, Kumardeep
    Huang, Sijia
    Garmire, Lana X.
    [J]. GENOME MEDICINE, 2021, 13 (01)
  • [15] Self-rated physical health predicts mortality in aging persons beyond objective health risks
    Reinwarth, Anna C.
    Wicke, Felix S.
    Hettich, Nora
    Ernst, Mareike
    Otten, Danielle
    Braehler, Elmar
    Wild, Philipp S.
    Muenzel, Thomas
    Koenig, Jochem
    Lackner, Karl J.
    Pfeiffer, Norbert
    Beutel, Manfred E.
    [J]. SCIENTIFIC REPORTS, 2023, 13 (01):
  • [16] Assessing housing exposures and interventions that impact healthy cities: a systematic overview of reviews
    Richards, G. C.
    Carpenter, J.
    Okpalugo, E.
    Howard, D. J.
    Heneghan, C.
    [J]. PERSPECTIVES IN PUBLIC HEALTH, 2023,
  • [17] Sleep quality and risk of cancer: findings from the English longitudinal study of aging
    Song, Chenxi
    Zhang, Rui
    Wang, Chunyue
    Fu, Rui
    Song, Weihua
    Dou, Kefei
    Wang, Shuang
    [J]. SLEEP, 2021, 44 (03)
  • [18] The State Council of the People's Republic of China, 2016, Xinhua News Agency, DOI DOI 10.26914/C.CNKIHY.2017.004714
  • [19] Causal associations of sleep traits with cancer incidence and mortality
    Tian, Shanshan
    Huangfu, Longtao
    Bao, Yanping
    Ai, Sizhi
    Chang, Suhua
    Wang, Qianwen
    Zhu, Ximei
    Yan, Wei
    Shi, Jie
    Shi, Le
    Deng, Jiahui
    Lu, Lin
    [J]. FRONTIERS IN GENETICS, 2023, 14
  • [20] Association between Sleep Traits and Lung Cancer: A Mendelian Randomization Study
    Wang, Jie
    Tang, Haibo
    Duan, Yumei
    Yang, Siyu
    An, Jian
    [J]. JOURNAL OF IMMUNOLOGY RESEARCH, 2021, 2021