A Fast and Accurate Method for Genome-wide Scale Phenome-wide G x E Analysis and Its Application to UK Biobank

被引:19
作者
Bi, Wenjian [1 ,2 ]
Zhao, Zhangchen [1 ,2 ]
Dey, Rounak [1 ,2 ,3 ]
Fritsche, Lars G. [1 ,2 ]
Mukherjee, Bhramar [1 ]
Lee, Seunggeun [1 ,2 ]
机构
[1] Univ Michigan, Dept Biostat, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Ctr Stat Genet, Ann Arbor, MI 48109 USA
[3] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
基金
美国国家卫生研究院;
关键词
GENE-ENVIRONMENT INTERACTION; MIXED-MODEL ANALYSIS; ASSOCIATION; DISEASE; SMOKING; RISK; INFERENCE; VARIANT; TRAITS; GENDER;
D O I
10.1016/j.ajhg.2019.10.008
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The etiology of most complex diseases involves genetic variants, environmental factors, and gene-environment interaction (G x F.) effects. Compared with marginal genetic association studies, G x F. analysis requires more samples and detailed measure of environmental exposures, and this limits the possible discoveries. Large-scale population-based biobanks with detailed phenotypic and environmental information, such as UK-Biobank, can be ideal resources for identifying G x F. effects. However, due to the large computation cost and the presence of case-control imbalance, existing methods often fail. Here we propose a scalable and accurate method, SPAGE (SaddlePoint Approximation implementation of G x F. analysis), that is applicable for genome-wide scale phenome-wide G x F. studies. SPAGE fits a genotype-independent logistic model only once across the genome-wide analysis in order to reduce computation cost, and SPAGE uses a saddlepoint approximation (SPA) to calibrate the test statistics for analysis of phenotypes with unbalanced case-control ratios. Simulation studies show that SPAGE is 33-79 times faster than the Wald test and 72-439 times faster than the Firth's test, and SPAGE can control type I error rates at the genome-wide significance level even when case-control ratios are extremely unbalanced. Through the analysis of UK-Biobank data of 344,341 white British European-ancestry samples, we show that SPAGE can efficiently analyze large samples while controlling for unbalanced case-control ratios.
引用
收藏
页码:1182 / 1192
页数:11
相关论文
共 53 条
[1]  
Bhattacharjee S, 2010, CGEN R PACKAGE ANAL
[2]   Statistical selection of biological models for genome-wide association analyses [J].
Bi, Wenjian ;
Kang, Guolian ;
Pounds, Stanley B. .
METHODS, 2018, 145 :67-75
[3]   APPROXIMATE INFERENCE IN GENERALIZED LINEAR MIXED MODELS [J].
BRESLOW, NE ;
CLAYTON, DG .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 1993, 88 (421) :9-25
[4]   Unravelling the human genome-phenome relationship using phenome-wide association studies [J].
Bush, William S. ;
Oetjens, Matthew T. ;
Crawford, Dana C. .
NATURE REVIEWS GENETICS, 2016, 17 (03) :129-145
[5]   The UK Biobank resource with deep phenotyping and genomic data [J].
Bycroft, Clare ;
Freeman, Colin ;
Petkova, Desislava ;
Band, Gavin ;
Elliott, Lloyd T. ;
Sharp, Kevin ;
Motyer, Allan ;
Vukcevic, Damjan ;
Delaneau, Olivier ;
O'Connell, Jared ;
Cortes, Adrian ;
Welsh, Samantha ;
Young, Alan ;
Effingham, Mark ;
McVean, Gil ;
Leslie, Stephen ;
Allen, Naomi ;
Donnelly, Peter ;
Marchini, Jonathan .
NATURE, 2018, 562 (7726) :203-+
[6]   Serniparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies [J].
Chatterjee, N ;
Carroll, RJ .
BIOMETRIKA, 2005, 92 (02) :399-418
[7]   Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models [J].
Chen, Han ;
Wang, Chaolong ;
Conomos, Matthew P. ;
Stilp, Adrienne M. ;
Li, Zilin ;
Sofer, Tamar ;
Szpiro, Adam A. ;
Chen, Wei ;
Brehm, John M. ;
Celedon, Juan C. ;
Redline, Susan ;
Papanicolaou, George J. ;
Thornton, Timothy A. ;
Laurie, Cathy C. ;
Rice, Kenneth ;
Lin, Xihong .
AMERICAN JOURNAL OF HUMAN GENETICS, 2016, 98 (04) :653-666
[8]   Two-stage testing procedures with independent filtering for genome-wide gene-environment interaction [J].
Dai, James Y. ;
Kooperberg, Charles ;
Leblanc, Michael ;
Prentice, Ross L. .
BIOMETRIKA, 2012, 99 (04) :929-944
[9]   Phenome-Wide Association Studies as a Tool to Advance Precision Medicine [J].
Denny, Joshua C. ;
Bastarache, Lisa ;
Roden, Dan M. .
ANNUAL REVIEW OF GENOMICS AND HUMAN GENETICS, VOL 17, 2016, 17 :353-373
[10]   Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data [J].
Denny, Joshua C. ;
Bastarache, Lisa ;
Ritchie, Marylyn D. ;
Carroll, Robert J. ;
Zink, Raquel ;
Mosley, Jonathan D. ;
Field, Julie R. ;
Pulley, Jill M. ;
Ramirez, Andrea H. ;
Bowton, Erica ;
Basford, Melissa A. ;
Carrell, David S. ;
Peissig, Peggy L. ;
Kho, Abel N. ;
Pacheco, Jennifer A. ;
Rasmussen, Luke V. ;
Crosslin, David R. ;
Crane, Paul K. ;
Pathak, Jyotishman ;
Bielinski, Suzette J. ;
Pendergrass, Sarah A. ;
Xu, Hua ;
Hindorff, Lucia A. ;
Li, Rongling ;
Manolio, Teri A. ;
Chute, Christopher G. ;
Chisholm, Rex L. ;
Larson, Eric B. ;
Jarvik, Gail P. ;
Brilliant, Murray H. ;
McCarty, Catherine A. ;
Kullo, Iftikhar J. ;
Haines, Jonathan L. ;
Crawford, Dana C. ;
Masys, Daniel R. ;
Roden, Dan M. .
NATURE BIOTECHNOLOGY, 2013, 31 (12) :1102-+