Improved polygenic prediction by Bayesian multiple regression on summary statistics

被引:296
作者
Lloyd-Jones, Luke R. [1 ]
Zeng, Jian [1 ,8 ]
Sidorenko, Julia [1 ,2 ]
Yengo, Loic [1 ]
Moser, Gerhard [3 ,4 ]
Kemper, Kathryn E. [1 ]
Wang, Huanwei [1 ]
Zheng, Zhili [1 ]
Magi, Reedik [2 ]
Esko, Tonu [2 ]
Metspalu, Andres [2 ,5 ]
Wray, Naomi R. [1 ,6 ]
Goddard, Michael E. [7 ]
Yang, Jian [1 ]
Visscher, Peter M. [1 ]
机构
[1] Univ Queensland, Inst Mol Biosci, Brisbane, Qld 4072, Australia
[2] Univ Tartu, Inst Genom, Estonian Genome Ctr, Riia 23b, EE-51010 Tartu, Estonia
[3] Cent Queensland Univ, Sch Engn & Technol, Rockhampton, Qld 4702, Australia
[4] Australian Agr Co Ltd, Brisbane, Qld 4006, Australia
[5] Univ Tartu, Inst Mol & Cell Biol, EE-51010 Tartu, Estonia
[6] Univ Queensland, Queensland Brain Inst, Brisbane, Qld 4072, Australia
[7] Univ Melbourne, Fac Vet & Agr Sci, Melbourne, Vic 3052, Australia
[8] Wenzhou Med Univ, Inst Adv Res, Wenzhou 325027, Zhejiang, Peoples R China
基金
英国医学研究理事会; 澳大利亚研究理事会;
关键词
GENOME-WIDE ASSOCIATION; MODELING LINKAGE DISEQUILIBRIUM; COMPLEX TRAITS; MIXED-MODEL; MISSING HERITABILITY; RISK; VARIANTS; IMPUTATION; ACCURACY; POPULATION;
D O I
10.1038/s41467-019-12653-0
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Accurate prediction of an individual's phenotype from their DNA sequence is one of the great promises of genomics and precision medicine. We extend a powerful individual-level data Bayesian multiple regression model (BayesR) to one that utilises summary statistics from genome-wide association studies (GWAS), SBayesR. In simulation and cross-validation using 12 real traits and 1.1 million variants on 350,000 individuals from the UK Biobank, SBayesR improves prediction accuracy relative to commonly used state-of-the-art summary statistics methods at a fraction of the computational resources. Furthermore, using summary statistics for variants from the largest GWAS meta-analysis (n approximate to 700, 000) on height and BMI, we show that on average across traits and two independent data sets that SBayesR improves prediction R-2 by 5.2% relative to LDpred and by 26.5% relative to clumping and p value thresholding.
引用
收藏
页数:11
相关论文
共 68 条
[1]   Fast Principal Component Analysis of Large-Scale Genome-Wide Data [J].
Abraham, Gad ;
Inouye, Michael .
PLOS ONE, 2014, 9 (04)
[2]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[3]   Building the foundation for genomics in precision medicine [J].
Aronson, Samuel J. ;
Rehm, Heidi L. .
NATURE, 2015, 526 (7573) :336-342
[4]   Prospects of Fine-Mapping Trait-Associated Genomic Regions by Using Summary Statistics from Genome-wide Association Studies [J].
Benner, Christian ;
Havulinna, Aki S. ;
Jarvelin, Marjo-Riitta ;
Salomaa, Veikko ;
Ripatti, Samuli ;
Pirinen, Matti .
AMERICAN JOURNAL OF HUMAN GENETICS, 2017, 101 (04) :539-551
[5]   An atlas of genetic correlations across human diseases and traits [J].
Bulik-Sullivan, Brendan ;
Finucane, Hilary K. ;
Anttila, Verneri ;
Gusev, Alexander ;
Day, Felix R. ;
Loh, Po-Ru ;
Duncan, Laramie ;
Perry, John R. B. ;
Patterson, Nick ;
Robinson, Elise B. ;
Daly, Mark J. ;
Price, Alkes L. ;
Neale, Benjamin M. .
NATURE GENETICS, 2015, 47 (11) :1236-+
[6]   LD Score regression distinguishes confounding from polygenicity in genome-wide association studies [J].
Bulik-Sullivan, Brendan K. ;
Loh, Po-Ru ;
Finucane, Hilary K. ;
Ripke, Stephan ;
Yang, Jian ;
Patterson, Nick ;
Daly, Mark J. ;
Price, Alkes L. ;
Neale, Benjamin M. .
NATURE GENETICS, 2015, 47 (03) :291-+
[7]   The UK Biobank resource with deep phenotyping and genomic data [J].
Bycroft, Clare ;
Freeman, Colin ;
Petkova, Desislava ;
Band, Gavin ;
Elliott, Lloyd T. ;
Sharp, Kevin ;
Motyer, Allan ;
Vukcevic, Damjan ;
Delaneau, Olivier ;
O'Connell, Jared ;
Cortes, Adrian ;
Welsh, Samantha ;
Young, Alan ;
Effingham, Mark ;
McVean, Gil ;
Leslie, Stephen ;
Allen, Naomi ;
Donnelly, Peter ;
Marchini, Jonathan .
NATURE, 2018, 562 (7726) :203-+
[8]   Second-generation PLINK: rising to the challenge of larger and richer datasets [J].
Chang, Christopher C. ;
Chow, Carson C. ;
Tellier, Laurent C. A. M. ;
Vattikuti, Shashaank ;
Purcell, Shaun M. ;
Lee, James J. .
GIGASCIENCE, 2015, 4
[9]   Developing and evaluating polygenic risk prediction models for stratified disease prevention [J].
Chatterjee, Nilanjan ;
Shi, Jianxin ;
Garcia-Closas, Montserrat .
NATURE REVIEWS GENETICS, 2016, 17 (07) :392-406
[10]   Predicting genetic predisposition in humans: the promise of whole-genome markers [J].
de los Campos, Gustavo ;
Gianola, Daniel ;
Allison, David B. .
NATURE REVIEWS GENETICS, 2010, 11 (12) :880-886