Machine learning in public health informatics: Evidence that complex sampling structures may not be needed for prediction models with imbalanced outcomes

被引:0
作者
Si, Zhengye [1 ]
Li, Jinpu [1 ]
Leary, Emily [1 ]
机构
[1] Univ Missouri, Sch Med, Dept Orthopaed Surg, Thompson Lab Regenerat Orthopaed, 1100 Virginia Ave, Columbia, MO 65211 USA
关键词
Public health informatics; Machine learning; NationaL Surveys; Data collection; Methods; REGULARIZATION PATHS; NATIONAL-HEALTH;
D O I
10.1016/j.annepidem.2024.12.016
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Purpose: The objective of this study is to investigate the predictive ability of machine learning models for imbalanced outcomes from national survey data without the use of sampling weights. Methods: We evaluated the predictive performance of machine learning models on imbalanced outcomes from the US National Health and Nutrition Examination Survey (USNHANES) without using sampling weights. Four machine learning models (support vector machine, random forest, least absolute shrinkage and selection operator regression, and deep neural network) were compared with a logistic model that incorporates the survey's complex sampling design. Three resampling methods (oversampling, undersampling, and combined) were used to address class imbalance during the model training process. Results: For all models, the balanced accuracy was similar (ranging from 0.72 to 0.76) and the specificity was smaller than sensitivity except for random forest. The support vector machine and neural networks performed best with sensitivity (ranging from 0.79 to 0.83), while the random forest had the largest specificity (ranging from 0.86 to 0.96), with one exception. PR-AUC scores and Brier scores were low ranging from 0.2529 to 0.3313 (lower scores worse) and 0.1005-0.3245 (lower scores better), respectively Conclusions: The machine learning models had overall similar predictive capacity to the recommended methods which integrate the complex sampling design for the prediction of osteoarthritis occurrence with USNHANES.
引用
收藏
页码:75 / 80
页数:6
相关论文
共 43 条
  • [41] Machine Learning in Epidemiology and Health Outcomes Research
    Wiemken, Timothy L.
    Kelley, Robert R.
    [J]. ANNUAL REVIEW OF PUBLIC HEALTH, VOL 41, 2020, 41 : 21 - 36
  • [42] Identification for heavy metals exposure on osteoarthritis among aging people and Machine learning for prediction: A study based on NHANES 2011-2020
    Xia, Fang
    Li, Qingwen
    Luo, Xin
    Wu, Jinyi
    [J]. FRONTIERS IN PUBLIC HEALTH, 2022, 10
  • [43] Simple Scoring System and Artificial Neural Network for Knee Osteoarthritis Risk Prediction: A Cross-Sectional Study
    Yoo, Tae Keun
    Kim, Deok Won
    Choi, Soo Beom
    Oh, Ein
    Park, Jee Soo
    [J]. PLOS ONE, 2016, 11 (02):