Utilizing logistic regression to compare risk factors in disease modeling with imbalanced data: a case study in vitamin D and cancer incidence

被引:1
|
作者
Meysami, Mohammad [1 ]
Kumar, Vijay [1 ]
Pugh, McKayah [2 ]
Lowery, Samuel Thomas [3 ]
Sur, Shantanu [4 ]
Mondal, Sumona [1 ]
Greene, James M. [1 ]
机构
[1] Clarkson Univ, Dept Math, Potsdam, NY 13699 USA
[2] Univ Northern Colorado, Dept Math Sci, Greeley, CO USA
[3] Slippery Rock Univ, Dept Math & Stat, Slippery Rock, PA 16057 USA
[4] Clarkson Univ, Dept Biol, Potsdam, NY 13699 USA
来源
FRONTIERS IN ONCOLOGY | 2023年 / 13卷
基金
美国国家科学基金会;
关键词
25-hydroxyvitamin D; cancer incidence; imbalanced data; randomized controlled trial; undersampling; UNBALANCED DATA; SEX-DIFFERENCES; HALLMARKS; OBESITY; SMOTE; AGE;
D O I
10.3389/fonc.2023.1227842
中图分类号
R73 [肿瘤学];
学科分类号
100214 ;
摘要
Imbalanced data, a common challenge encountered in statistical analyses of clinical trial datasets and disease modeling, refers to the scenario where one class significantly outnumbers the other in a binary classification problem. This imbalance can lead to biased model performance, favoring the majority class, and affecting the understanding of the relative importance of predictive variables. Despite its prevalence, the existing literature lacks comprehensive studies that elucidate methodologies to handle imbalanced data effectively. In this study, we discuss the binary logistic model and its limitations when dealing with imbalanced data, as model performance tends to be biased towards the majority class. We propose a novel approach to addressing imbalanced data and apply it to publicly available data from the VITAL trial, a large-scale clinical trial that examines the effects of vitamin D and Omega-3 fatty acid to investigate the relationship between vitamin D and cancer incidence in sub-populations based on race/ethnicity and demographic factors such as body mass index (BMI), age, and sex. Our results demonstrate a significant improvement in model performance after our undersampling method is applied to the data set with respect to cancer incidence prediction. Both epidemiological and laboratory studies have suggested that vitamin D may lower the occurrence and death rate of cancer, but inconsistent and conflicting findings have been reported due to the difficulty of conducting large-scale clinical trials. We also utilize logistic regression within each ethnic sub-population to determine the impact of demographic factors on cancer incidence, with a particular focus on the role of vitamin D. This study provides a framework for using classification models to understand relative variable importance when dealing with imbalanced data.
引用
收藏
页数:14
相关论文
共 9 条
  • [1] Vitamin D Deficiency and Cardiovascular Disease Risk Factors Among American Indian Adolescents: The Strong Heart Family Study
    Reese, Jessica A.
    Davis, Erin
    Fretts, Amanda M.
    Ali, Tauqeer
    Lee, Elisa T.
    Umans, Jason G.
    Yarden, Ronit
    Zhang, Ying
    Peck, Jennifer D.
    PREVENTING CHRONIC DISEASE, 2025, 22
  • [2] Incidence and risk factors of cancer in individuals with cystic fibrosis in the UK; a case-control study
    Archangelidi, Olga
    Cullinan, Paul
    Simmonds, Nicholas J.
    Mentzakis, Emmanouil
    Peckham, Daniel
    Bilton, Diana
    Carr, Siobhan B.
    JOURNAL OF CYSTIC FIBROSIS, 2022, 21 (02) : 302 - 308
  • [3] Vitamin D Status and Risk of Breast Cancer in Iranian Women: A Case-Control Study
    Jamshidinaeini, Yasaman
    Akbari, Mohammad Esmaeil
    Abdollahi, Morteza
    Ajami, Marjan
    Davoodi, Sayed Hossein
    JOURNAL OF THE AMERICAN COLLEGE OF NUTRITION, 2016, 35 (07) : 639 - 646
  • [4] Disentangling discordant vitamin D associations with prostate cancer incidence and fatality in a large, nested case-control study
    Etievant, Lola
    Gail, Mitchell H.
    Albanes, Demetrius
    INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2024, 53 (05)
  • [5] Incidence and Risk Factors of Cardio-Cerebrovascular Disease in Korean Menopausal Women: A Retrospective Observational Study using the Korean Genome and Epidemiology Study data
    Park, Jin-Hee
    Seo, Eun Ji
    Bae, Sun Hyoung
    ASIAN NURSING RESEARCH, 2021, 15 (04) : 265 - 271
  • [6] Establishment of a logistic regression model nomogram for clinicopathological characteristics and risk factors with axillary lymph node metastasis in T1 locally advanced breast cancer: a retrospective study
    Qian, Fang
    Shen, Haoyuan
    Deng, Chunyan
    Liu, Chenghao
    Su, Tingting
    Chen, Anli
    Hu, Di
    Zhu, Jiacheng
    GLAND SURGERY, 2024, 13 (06) : 871 - 884
  • [7] The association between metabolic risk factors, nonalcoholic fatty liver disease, and the incidence of liver cancer: a nationwide population-based cohort study
    Chen, Yu-Guang
    Yang, Chih-Wei
    Chung, Chi-Hsiang
    Ho, Ching-Liang
    Chen, Wei-Liang
    Chien, Wu-Chien
    HEPATOLOGY INTERNATIONAL, 2022, 16 (04) : 807 - 816
  • [8] Serum vitamin D level, sun-exposed area, dietary factors, and physical activity as predictors of invasive breast cancer risk among Sudanese women: A case-control study
    Husain, Nazik Elmalaika
    Suliman, Ahmed A.
    Abdelrahman, Ismail
    Bedri, Shahinaz A.
    Musa, Rasha M.
    Osman, Hind E.
    Mustafa, Ayda H.
    Gafer, Nahla
    Farah, Ehab
    Satir, Ali Abdel
    Ahmed, Mohamed H.
    Osman, Mugtaba
    Agaimy, Abbas
    JOURNAL OF FAMILY MEDICINE AND PRIMARY CARE, 2019, 8 (05) : 1706 - 1714
  • [9] Low vitamin D levels and non-alcoholic fatty liver disease, evidence for their independent association in men in East China: a cross-sectional study (Survey on Prevalence in East China for Metabolic Diseases and Risk Factors (SPECT-China))
    Zhai, Hua-Ling
    Wang, Ning-Jian
    Han, Bing
    Li, Qin
    Chen, Yi
    Zhu, Chun-Fang
    Chen, Ying-Chao
    Xia, Fang-Zhen
    Cang, Zhen
    Zhu, Chao-Xia
    Lu, Meng
    Lu, Ying-Li
    BRITISH JOURNAL OF NUTRITION, 2016, 115 (08) : 1352 - 1359