Performance of variable selection methods for assessing the health effects of correlated exposures in case-control studies

被引:37
|
作者
Lenters, Virissa [1 ]
Vermeulen, Roel [1 ,2 ]
Portengen, Lutzen [1 ]
机构
[1] Univ Utrecht, Inst Risk Assessment Sci, Div Environm Epidemiol, Utrecht, Netherlands
[2] Univ Med Ctr Utrecht, Dept Epidemiol, Julius Ctr Hlth Sci & Primary Care, Utrecht, Netherlands
关键词
collinearity; environment-wide association; model selection; multipollutant; variable selection; FALSE DISCOVERY RATE; MEASUREMENT ERROR; REGRESSION; REGULARIZATION; INFERENCE; MODELS; COLLINEARITY;
D O I
10.1136/oemed-2016-104231
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Objectives There is growing recognition that simultaneously assessing multiple exposures may reduce false positive discoveries and improve epidemiological effect estimates. We evaluated the performance of statistical methods for identifying exposure-outcome associations across various data structures typical of environmental and occupational epidemiology analyses. Methods We simulated a case-control study, generating 100 data sets for each of 270 different simulation scenarios; varying the number of exposure variables, the correlation between exposures, sample size, the number of effective exposures and the magnitude of effect estimates. We compared conventional analytical approaches, that is, univariable (with and without multiplicity adjustment), multivariable and stepwise logistic regression, with variable selection methods: sparse partial least squares discriminant analysis, boosting, and frequentist and Bayesian penalised regression approaches. Results The variable selection methods consistently yielded more precise effect estimates and generally improved selection accuracy compared with conventional logistic regression methods, especially for scenarios with higher correlation levels. Penalised lasso and elastic net regression both seemed to perform particularly well, specifically when statistical inference based on a balanced weighting of high sensitivity and a low proportion of false discoveries is sought. Conclusions In this extensive simulation study with multicollinear data, we found that most variable selection methods consistently outperformed conventional approaches, and demonstrated how performance is influenced by the structure of the data and underlying model.
引用
收藏
页码:522 / 529
页数:8
相关论文
共 41 条
  • [1] Bayesian Variable Selection Methods for Matched Case-Control Studies
    Asafu-Adjei, Josephine
    Tadesse, Mahlet G.
    Coull, Brent
    Balasubramanian, Raji
    Lev, Michael
    Schwamm, Lee
    Betensky, Rebecca
    INTERNATIONAL JOURNAL OF BIOSTATISTICS, 2017, 13 (01)
  • [2] A Bayesian mixture modeling approach for assessing the effects of correlated exposures in case-control studies
    de Vocht, Frank
    Cherry, Nicola
    Wakefield, Jon
    JOURNAL OF EXPOSURE SCIENCE AND ENVIRONMENTAL EPIDEMIOLOGY, 2012, 22 (04) : 352 - 360
  • [3] Performance of variable and function selection methods for estimating the nonlinear health effects of correlated chemical mixtures: A simulation study
    Lazarevic, Nina
    Knibbs, Luke D.
    Sly, Peter D.
    Barnett, Adrian G.
    STATISTICS IN MEDICINE, 2020, 39 (27) : 3947 - 3967
  • [4] Performance of instrumental variable methods in cohort and nested case-control studies: a simulation study
    Uddin, Md Jamal
    Groenwold, Rolf H. H.
    de Boer, Anthonius
    Belitser, Svetlana V.
    Roes, Kit C. B.
    Hoes, Arno W.
    Klungel, Olaf H.
    PHARMACOEPIDEMIOLOGY AND DRUG SAFETY, 2014, 23 (02) : 165 - 177
  • [5] Using Kendall's tau(b) correlations to improve variable selection methods in case-control studies
    OGorman, TW
    Woolson, RF
    BIOMETRICS, 1995, 51 (04) : 1451 - 1460
  • [6] The performance of methods for correcting measurement error in case-control studies
    Stürmer, T
    Thürigen, D
    Spiegelman, D
    Blettner, M
    Brenner, H
    EPIDEMIOLOGY, 2002, 13 (05) : 507 - 516
  • [7] Bayesian model averaging: improved variable selection for matched case-control studies
    Mu, Yi
    See, Isaac
    Edwards, Jonathan R.
    EPIDEMIOLOGY BIOSTATISTICS AND PUBLIC HEALTH, 2019, 16 (02)
  • [8] Estimation and selection of complex covariate effects in pooled nested case-control studies with heterogeneity
    Liu, Mengling
    Lu, Wenbin
    Krogh, Vittorio
    Hallmans, Goran
    Clendenen, Tess V.
    Zeleniuch-Jacquotte, Anne
    BIOSTATISTICS, 2013, 14 (04) : 682 - 694
  • [9] EFFICIENCY LOSS FROM CATEGORIZING QUANTITATIVE EXPOSURES INTO QUALITATIVE EXPOSURES IN CASE-CONTROL STUDIES
    ZHAO, LP
    KOLONEL, LN
    AMERICAN JOURNAL OF EPIDEMIOLOGY, 1992, 136 (04) : 464 - 474
  • [10] Assessing risk prediction models in case-control studies using semiparametric and nonparametric methods
    Huang, Ying
    Pepe, Margaret Sullivan
    STATISTICS IN MEDICINE, 2010, 29 (13) : 1391 - 1410