Performance of variable selection methods for assessing the health effects of correlated exposures in case-control studies

被引:37
作者
Lenters, Virissa [1 ]
Vermeulen, Roel [1 ,2 ]
Portengen, Lutzen [1 ]
机构
[1] Univ Utrecht, Inst Risk Assessment Sci, Div Environm Epidemiol, Utrecht, Netherlands
[2] Univ Med Ctr Utrecht, Dept Epidemiol, Julius Ctr Hlth Sci & Primary Care, Utrecht, Netherlands
关键词
collinearity; environment-wide association; model selection; multipollutant; variable selection; FALSE DISCOVERY RATE; MEASUREMENT ERROR; REGRESSION; REGULARIZATION; INFERENCE; MODELS; COLLINEARITY;
D O I
10.1136/oemed-2016-104231
中图分类号
R1 [预防医学、卫生学];
学科分类号
1004 ; 120402 ;
摘要
Objectives There is growing recognition that simultaneously assessing multiple exposures may reduce false positive discoveries and improve epidemiological effect estimates. We evaluated the performance of statistical methods for identifying exposure-outcome associations across various data structures typical of environmental and occupational epidemiology analyses. Methods We simulated a case-control study, generating 100 data sets for each of 270 different simulation scenarios; varying the number of exposure variables, the correlation between exposures, sample size, the number of effective exposures and the magnitude of effect estimates. We compared conventional analytical approaches, that is, univariable (with and without multiplicity adjustment), multivariable and stepwise logistic regression, with variable selection methods: sparse partial least squares discriminant analysis, boosting, and frequentist and Bayesian penalised regression approaches. Results The variable selection methods consistently yielded more precise effect estimates and generally improved selection accuracy compared with conventional logistic regression methods, especially for scenarios with higher correlation levels. Penalised lasso and elastic net regression both seemed to perform particularly well, specifically when statistical inference based on a balanced weighting of high sensitivity and a low proportion of false discoveries is sought. Conclusions In this extensive simulation study with multicollinear data, we found that most variable selection methods consistently outperformed conventional approaches, and demonstrated how performance is influenced by the structure of the data and underlying model.
引用
收藏
页码:522 / 529
页数:8
相关论文
共 42 条
[31]   Statistical methods for biomarker data pooled from multiple nested case-control studies [J].
Sloan, Abigail ;
Smith-Warner, Stephanie A. ;
Ziegler, Regina G. ;
Wang, Molin .
BIOSTATISTICS, 2021, 22 (03) :541-557
[32]   Study on Effects of Population Stratification on Haplotype Trend Test in Case-Control Studies [J].
Kim, Jinheum ;
Kang, Dae Ryong ;
Lim, Hyunsun ;
Nam, Chung Mo .
KOREAN JOURNAL OF APPLIED STATISTICS, 2009, 22 (05) :1085-1096
[33]   Effects of systematic exposure assessment errors in partially ecologic case-control studies [J].
Björk, J ;
Strömberg, U .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2002, 31 (01) :154-160
[34]   Comparison of Different Haplotype-Based Association Methods for Gene-Environment (GxE) Interactions in Case-Control Studies when Haplotype-Phase Is Ambiguous [J].
Hein, Rebecca ;
Beckmann, Lars ;
Chang-Claude, Jenny .
HUMAN HEREDITY, 2009, 68 (04) :252-267
[35]   Semiparametric methods for evaluating the covariate-specific predictiveness of continuous markers in matched case-control studies [J].
Huang, Y. ;
Pepe, M. S. .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES C-APPLIED STATISTICS, 2010, 59 :437-456
[36]   Including known covariates can reduce power to detect genetic effects in case-control studies [J].
Pirinen, Matti ;
Donnelly, Peter ;
Spencer, Chris C. A. .
NATURE GENETICS, 2012, 44 (08) :848-+
[37]   Statistical Learning Methods Applicable to Genome-Wide Association Studies on Unbalanced Case-Control Disease Data [J].
Dai, Xiaotian ;
Fu, Guifang ;
Zhao, Shaofei ;
Zeng, Yifei .
GENES, 2021, 12 (05)
[38]   COMPARISON OF 3 METHODS OF ESTIMATING ODDS RATIOS FROM A JOB EXPOSURE MATRIX IN OCCUPATIONAL CASE-CONTROL STUDIES [J].
BOUYER, J ;
HEMON, D .
AMERICAN JOURNAL OF EPIDEMIOLOGY, 1993, 137 (04) :472-481
[39]   Semiparametric Maximum Likelihood Methods for Analyzing Genetic and Environmental Effects with Case-Control Mother-Child Pair Data [J].
Chen, Jinbo ;
Lin, Dongyu ;
Hochner, Hagit .
BIOMETRICS, 2012, 68 (03) :869-877
[40]   A Multi-Locus Likelihood Method for Assessing Parent-of-Origin Effects Using Case-Control Mother-Child Pairs [J].
Lin, Dongyu ;
Weinberg, Clarice R. ;
Feng, Rui ;
Hochner, Hagit ;
Chen, Jinbo .
GENETIC EPIDEMIOLOGY, 2013, 37 (02) :152-162