A Fast and Accurate Method for Genome-wide Scale Phenome-wide G x E Analysis and Its Application to UK Biobank

被引:19
作者
Bi, Wenjian [1 ,2 ]
Zhao, Zhangchen [1 ,2 ]
Dey, Rounak [1 ,2 ,3 ]
Fritsche, Lars G. [1 ,2 ]
Mukherjee, Bhramar [1 ]
Lee, Seunggeun [1 ,2 ]
机构
[1] Univ Michigan, Dept Biostat, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Ctr Stat Genet, Ann Arbor, MI 48109 USA
[3] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
基金
美国国家卫生研究院;
关键词
GENE-ENVIRONMENT INTERACTION; MIXED-MODEL ANALYSIS; ASSOCIATION; DISEASE; SMOKING; RISK; INFERENCE; VARIANT; TRAITS; GENDER;
D O I
10.1016/j.ajhg.2019.10.008
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The etiology of most complex diseases involves genetic variants, environmental factors, and gene-environment interaction (G x F.) effects. Compared with marginal genetic association studies, G x F. analysis requires more samples and detailed measure of environmental exposures, and this limits the possible discoveries. Large-scale population-based biobanks with detailed phenotypic and environmental information, such as UK-Biobank, can be ideal resources for identifying G x F. effects. However, due to the large computation cost and the presence of case-control imbalance, existing methods often fail. Here we propose a scalable and accurate method, SPAGE (SaddlePoint Approximation implementation of G x F. analysis), that is applicable for genome-wide scale phenome-wide G x F. studies. SPAGE fits a genotype-independent logistic model only once across the genome-wide analysis in order to reduce computation cost, and SPAGE uses a saddlepoint approximation (SPA) to calibrate the test statistics for analysis of phenotypes with unbalanced case-control ratios. Simulation studies show that SPAGE is 33-79 times faster than the Wald test and 72-439 times faster than the Firth's test, and SPAGE can control type I error rates at the genome-wide significance level even when case-control ratios are extremely unbalanced. Through the analysis of UK-Biobank data of 344,341 white British European-ancestry samples, we show that SPAGE can efficiently analyze large samples while controlling for unbalanced case-control ratios.
引用
收藏
页码:1182 / 1192
页数:11
相关论文
共 53 条
[51]   Efficiently controlling for case-control imbalance and sample relatedness in large-scale genetic association studies [J].
Zhou, Wei ;
Nielsen, Jonas B. ;
Fritsche, Lars G. ;
Dey, Rounak ;
Gabrielsen, Maiken E. ;
Wolford, Brooke N. ;
LeFaive, Jonathon ;
VandeHaar, Peter ;
Gagliano, Sarah A. ;
Gifford, Aliya ;
Bastarache, Lisa A. ;
Wei, Wei-Qi ;
Denny, Joshua C. ;
Lin, Maoxuan ;
Hveem, Kristian ;
Kang, Hyun Min ;
Abecasis, Goncalo R. ;
Willer, Cristen J. ;
Lee, Seunggeun .
NATURE GENETICS, 2018, 50 (09) :1335-+
[52]   Genome-wide efficient mixed-model analysis for association studies [J].
Zhou, Xiang ;
Stephens, Matthew .
NATURE GENETICS, 2012, 44 (07) :821-U136
[53]  
Zhou YG, 2018, INT J CLIN EXP PATHO, V11, P4585