Performance of variable selection methods for assessing the health effects of correlated exposures in case-control studies

被引:37
作者
Lenters, Virissa [1 ]
Vermeulen, Roel [1 ,2 ]
Portengen, Lutzen [1 ]
机构
[1] Univ Utrecht, Inst Risk Assessment Sci, Div Environm Epidemiol, Utrecht, Netherlands
[2] Univ Med Ctr Utrecht, Dept Epidemiol, Julius Ctr Hlth Sci & Primary Care, Utrecht, Netherlands
关键词
collinearity; environment-wide association; model selection; multipollutant; variable selection; FALSE DISCOVERY RATE; MEASUREMENT ERROR; REGRESSION; REGULARIZATION; INFERENCE; MODELS; COLLINEARITY;
D O I
10.1136/oemed-2016-104231
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Objectives There is growing recognition that simultaneously assessing multiple exposures may reduce false positive discoveries and improve epidemiological effect estimates. We evaluated the performance of statistical methods for identifying exposure-outcome associations across various data structures typical of environmental and occupational epidemiology analyses. Methods We simulated a case-control study, generating 100 data sets for each of 270 different simulation scenarios; varying the number of exposure variables, the correlation between exposures, sample size, the number of effective exposures and the magnitude of effect estimates. We compared conventional analytical approaches, that is, univariable (with and without multiplicity adjustment), multivariable and stepwise logistic regression, with variable selection methods: sparse partial least squares discriminant analysis, boosting, and frequentist and Bayesian penalised regression approaches. Results The variable selection methods consistently yielded more precise effect estimates and generally improved selection accuracy compared with conventional logistic regression methods, especially for scenarios with higher correlation levels. Penalised lasso and elastic net regression both seemed to perform particularly well, specifically when statistical inference based on a balanced weighting of high sensitivity and a low proportion of false discoveries is sought. Conclusions In this extensive simulation study with multicollinear data, we found that most variable selection methods consistently outperformed conventional approaches, and demonstrated how performance is influenced by the structure of the data and underlying model.
引用
收藏
页码:522 / 529
页数:8
相关论文
共 41 条
  • [21] Statistical methods for analysis of combined biomarker data from multiple nested case-control studies
    Cheng, Chao
    Sloan, Abigail
    Wang, Molin
    STATISTICAL METHODS IN MEDICAL RESEARCH, 2021, 30 (08) : 1944 - 1959
  • [22] Validity of using ad hoc methods to analyze secondary traits in case-control association studies
    Yung, Godwin
    Lin, Xihong
    GENETIC EPIDEMIOLOGY, 2016, 40 (08) : 732 - 743
  • [23] Semiparametric methods for evaluating risk prediction markers in case-control studies
    Huang, Ying
    Pepe, Margaret Sullivan
    BIOMETRIKA, 2009, 96 (04) : 991 - 997
  • [24] Robust methods for detecting familial aggregation of a quantitative trait in matched case-control family studies
    Wang, Jiun-Yi
    Chen, Li-Ching
    Lin, Hui-Min
    JOURNAL OF APPLIED STATISTICS, 2012, 39 (10) : 2097 - 2111
  • [25] A Double Robust Approach to Causal Effects in Case-Control Studies
    Rose, Sherri
    van der Laan, Mark
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 2014, 179 (06) : 663 - 669
  • [26] Efficient estimation of indirect effects in case-control studies using a unified likelihood framework
    Satten, Glen A.
    Curtis, Sarah W.
    Solis-Lemus, Claudia
    Leslie, Elizabeth J.
    Epstein, Michael P.
    STATISTICS IN MEDICINE, 2022, 41 (15) : 2879 - 2893
  • [27] Assessing Incremental Value of Biomarkers with Multi-Phase Nested Case-Control Studies
    Zhou, Qian M.
    Zheng, Yingye
    Chibnik, Lori B.
    Karlson, Elizabeth W.
    Cai, Tianxi
    BIOMETRICS, 2015, 71 (04) : 1139 - 1149
  • [28] Weighting methods for population-based case-control studies with complex sampling
    Li, Yan
    Graubard, Barry I.
    DiGaetano, Ralph
    JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2011, 60 : 165 - 185
  • [29] Uncovering selection bias in case-control studies using Bayesian post-stratification
    Geneletti, S.
    Best, N.
    Toledano, M. B.
    Elliott, P.
    Richardson, S.
    STATISTICS IN MEDICINE, 2013, 32 (15) : 2555 - 2570
  • [30] Evaluating classification performance of biomarkers in two-phase case-control studies
    Wang, Lu
    Huang, Ying
    STATISTICS IN MEDICINE, 2019, 38 (01) : 100 - 114