UK Biobank Whole-Exome Sequence Binary Phenome Analysis with Robust Region-Based Rare-Variant Test

被引:54
作者
Zhao, Zhangchen [1 ,2 ]
Bi, Wenjian [1 ,2 ]
Zhou, Wei [3 ,4 ,5 ]
VandeHaar, Peter [1 ,2 ]
Fritsche, Lars G. [1 ,2 ]
Lee, Seunggeun [1 ,2 ]
机构
[1] Univ Michigan, Sch Publ Hlth, Dept Biostat, Ann Arbor, MI 48109 USA
[2] Univ Michigan, Sch Publ Hlth, Ctr Stat Genet, Ann Arbor, MI 48109 USA
[3] Massachusetts Gen Hosp, Analyt & Translat Genet Unit, Boston, MA 02114 USA
[4] Broad Inst Harvard & MIT, Program Med & Populat Genet, Cambridge, MA 02142 USA
[5] Broad Inst Harvard & MIT, Stanley Ctr Psychiat Res, Cambridge, MA 02142 USA
基金
英国医学研究理事会; 美国国家卫生研究院;
关键词
ASSOCIATION ANALYSIS; COMMON DISEASES; SINGLE; RISK; GENE;
D O I
10.1016/j.ajhg.2019.11.012
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
In biobank data analysis, most binary phenotypes have unbalanced case-control ratios, and this can cause inflation of type I error rates. Recently, a saddle point approximation (SPA) based single-variant test has been developed to provide an accurate and scalable method to test for associations of such phenotypes. For gene- or region-based multiple-variant tests, a few methods exist that can adjust for unbalanced case-control ratios; however, these methods are either less accurate when case-control ratios are extremely unbalanced or not scalable for large data analyses. To address these problems, we propose SKAT- and SKAT-O- type region-based tests; in these tests, the singlevariant score statistic is calibrated based on SPA and efficient resampling (ER). Through simulation studies, we show that the proposed method provides well-calibrated p values. In contrast, when the case-control ratio is 1:99, the unadjusted approach has greatly inflated type I error rates (90 times that of exome-wide sequencing alpha = 2.5 x 10(-6)). Additionally, the proposed method has similar computation time to the unadjusted approaches and is scalable for large sample data. In our application, the UK Biobank whole-exome sequence data analysis of 45,596 unrelated European samples and 791 PheCode phenotypes identified 10 rare-variant associations with p value < 10(-7), including the associations between JAK2 and myeloproliferative disease, HOXB13 and cancer of prostate, and F11 and congenital coagulation defects. All analysis summary results are publicly available through a web-based visual server, and this availability can help facilitate the identification of the genetic basis of complex diseases.
引用
收藏
页码:3 / 12
页数:10
相关论文
共 33 条
[1]   ORGANIZATION OF THE GENE FOR HUMAN FACTOR-XI [J].
ASAKAI, R ;
DAVIE, EW ;
CHUNG, DW .
BIOCHEMISTRY, 1987, 26 (23) :7221-7228
[2]  
Aubry W, 2013, AM HEALTH DRUG BENEF, V6, P15
[3]   Acquired mutation of the tyrosine kinase JAK2 in human myeloproliferative disorders [J].
Baxter, EJ ;
Scott, LM ;
Campbell, PJ ;
East, C ;
Fourouclas, N ;
Swanton, S ;
Vassiliou, GS ;
Bench, AJ ;
Boyd, EM ;
Curtin, N ;
Scott, MA ;
Erber, WN ;
Green, AR .
LANCET, 2005, 365 (9464) :1054-1061
[4]   A Fast and Accurate Method for Genome-wide Scale Phenome-wide G x E Analysis and Its Application to UK Biobank [J].
Bi, Wenjian ;
Zhao, Zhangchen ;
Dey, Rounak ;
Fritsche, Lars G. ;
Mukherjee, Bhramar ;
Lee, Seunggeun .
AMERICAN JOURNAL OF HUMAN GENETICS, 2019, 105 (06) :1182-1192
[5]  
Biobank U., 2019, UK BIOB EX DAT REL F
[6]   Unravelling the human genome-phenome relationship using phenome-wide association studies [J].
Bush, William S. ;
Oetjens, Matthew T. ;
Crawford, Dana C. .
NATURE REVIEWS GENETICS, 2016, 17 (03) :129-145
[7]  
Bycroft, 2017, BIORXIV, DOI DOI 10.1101/166298
[8]   The UK Biobank resource with deep phenotyping and genomic data [J].
Bycroft, Clare ;
Freeman, Colin ;
Petkova, Desislava ;
Band, Gavin ;
Elliott, Lloyd T. ;
Sharp, Kevin ;
Motyer, Allan ;
Vukcevic, Damjan ;
Delaneau, Olivier ;
O'Connell, Jared ;
Cortes, Adrian ;
Welsh, Samantha ;
Young, Alan ;
Effingham, Mark ;
McVean, Gil ;
Leslie, Stephen ;
Allen, Naomi ;
Donnelly, Peter ;
Marchini, Jonathan .
NATURE, 2018, 562 (7726) :203-+
[9]   Efficient Variant Set Mixed Model Association Tests for Continuous and Binary Traits in Large-Scale Whole-Genome Sequencing Studies [J].
Chen, Han ;
Huffman, Jennifer E. ;
Brody, Jennifer A. ;
Wang, Chaolong ;
Lee, Seunggeun ;
Li, Zilin ;
Gogarten, Stephanie M. ;
Sofer, Tamar ;
Bielak, Lawrence F. ;
Bis, Joshua C. ;
Blangero, John ;
Bowler, Russell P. ;
Cade, Brian E. ;
Cho, Michael H. ;
Correa, Adolfo ;
Curran, Joanne E. ;
de Vries, Paul S. ;
Glahn, David C. ;
Guo, Xiuqing ;
Johnson, Andrew D. ;
Kardia, Sharon ;
Kooperberg, Charles ;
Lewis, Joshua P. ;
Liu, Xiaoming ;
Mathias, Rasika A. ;
Mitchell, Braxton D. ;
O'Connell, Jeffrey R. ;
Peyser, Patricia A. ;
Post, Wendy S. ;
Reiner, Alex P. ;
Rich, Stephen S. ;
Rotter, Jerome I. ;
Silverman, Edwin K. ;
Smith, Jennifer A. ;
Vasan, Ramachandran S. ;
Wilson, James G. ;
Yanek, Lisa R. ;
Redline, Susan ;
Smith, Nicholas L. ;
Boerwinkle, Eric ;
Borecki, Ingrid B. ;
Cupples, L. Adrienne ;
Laurie, Cathy C. ;
Morrison, Alanna C. ;
Rice, Kenneth M. ;
Lin, Xihong .
AMERICAN JOURNAL OF HUMAN GENETICS, 2019, 104 (02) :260-274
[10]   Control for Population Structure and Relatedness for Binary Traits in Genetic Association Studies via Logistic Mixed Models [J].
Chen, Han ;
Wang, Chaolong ;
Conomos, Matthew P. ;
Stilp, Adrienne M. ;
Li, Zilin ;
Sofer, Tamar ;
Szpiro, Adam A. ;
Chen, Wei ;
Brehm, John M. ;
Celedon, Juan C. ;
Redline, Susan ;
Papanicolaou, George J. ;
Thornton, Timothy A. ;
Laurie, Cathy C. ;
Rice, Kenneth ;
Lin, Xihong .
AMERICAN JOURNAL OF HUMAN GENETICS, 2016, 98 (04) :653-666