Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores

被引:870
作者
Vilhjalmsson, Bjarni J. [1 ,2 ,3 ]
Yang, Jian [1 ,5 ,6 ]
Finucane, Hilary K. [1 ,2 ,3 ,4 ,6 ]
Gusev, Alexander [2 ,3 ]
Lindstrom, Sara [1 ,2 ]
Ripke, Stephan [7 ,8 ,9 ,10 ]
Genovese, Giulio [3 ,7 ,8 ,11 ]
Loh, Po-Ru [1 ,2 ,3 ]
Bhatia, Gaurav [1 ,2 ,3 ]
Do, Ron [12 ]
Hayeck, Tristan [1 ,2 ,3 ]
Won, Hong-Hee [3 ,14 ,15 ]
Kathiresan, Sekar [3 ,14 ,15 ]
Pato, Michele [16 ]
Pato, Carlos [16 ]
Tamimi, Rulla [1 ,2 ,17 ]
Stahl, Eli [3 ,13 ,18 ,19 ]
Zaitlen, Noah [20 ]
Pasaniuc, Bogdan [21 ]
Belbin, Gillian
Kenny, Eimear E. [12 ,13 ,19 ,22 ]
Schierup, Mikkel H.
De Jager, Philip [3 ,23 ,24 ]
Patsopouos, Nikolaos A. [3 ,23 ,24 ]
Mc Carroll, Steve [3 ,7 ,8 ,11 ]
Daly, Mark [3 ,7 ,8 ]
Purce, Shaun [3 ,13 ,18 ,19 ]
Chasman, Daniel [23 ,25 ]
Neale, Benjamin [3 ,7 ]
Goddard, Michael [26 ,27 ]
Visscher, Peter M. [5 ,6 ]
Kraft, Peter [1 ,2 ,3 ,28 ]
Patterson, Nick [3 ]
Price, Alkes L. [1 ,2 ,3 ,28 ]
机构
[1] Harvard Univ, TH Chan Sch Publ Hlth, Dept Epidemiol, Boston, MA 02115 USA
[2] Harvard Univ, TH Chan Sch Publ Hlth, Program Genet Epidemiol & Stat Genet, Boston, MA 02115 USA
[3] Broad Inst Harvard & MIT, Program Med & Populat Genet, Cambridge, MA 02142 USA
[4] Aarhus Univ, Bioinformat Res Ctr, DK-8000 Aarhus, Denmark
[5] Univ Queensland, Queensland Brain Inst, Brisbane, Qld 4072, Australia
[6] Univ Queensland, Diamantina Inst, Translat Res Inst, Brisbane, Qld 4101, Australia
[7] MIT, Dept Math, Cambridge, MA 02139 USA
[8] Broad Inst Harvard & MIT, Stanley Ctr Psychiat Res, Cambridge, MA 02142 USA
[9] Massachusetts Gen Hosp, Analyt & Translat Genet Unit, Boston, MA 02114 USA
[10] Charite, Dept Psychiat & Psychotherapy, D-10117 Berlin, Germany
[11] Harvard Univ, Sch Med, Dept Genet, Boston, MA 02115 USA
[12] Icahn Sch Med Mt Sinai, Charles Bronfman Inst Personalized Med, New York, NY 10029 USA
[13] Icahn Sch Med Mt Sinai, Dept Genet & Genom Sci, New York, NY 10029 USA
[14] Massachusetts Gen Hosp, Cardiovasc Res Ctr, Boston, MA 02114 USA
[15] Harvard Univ, Sch Med, Boston, MA 02114 USA
[16] Univ So Calif, Dept Psychiat & Behav Sci, Keck Sch Med, Los Angeles, CA 90089 USA
[17] Brigham & Womens Hosp, Channing Div Network Med, Boston, MA 02115 USA
[18] Icahn Sch Med Mt Sinai, Div Psychiat Genom, New York, NY 10029 USA
[19] Icahn Sch Med Mt Sinai, Ctr Stat Genet, New York, NY 10029 USA
[20] Univ Calif San Francisco, Lung Biol Ctr, Dept Med, San Francisco, CA 94143 USA
[21] Univ Calif Los Angeles, Dept Pathol & Lab Med, Los Angeles, CA 90095 USA
[22] Icahn Sch Med Mt Sinai, Icahn Inst Genom & Multiscale Biol, New York, NY 10029 USA
[23] Harvard Univ, Sch Med, Dept Med, Boston, MA 02115 USA
[24] Brigham & Womens Hosp, Program Translat NeuroPsychiat Genom, Ann Romney Ctr Neurol Dis, Dept Neurol, Boston, MA 02115 USA
[25] Brigham & Womens Hosp, Div Prevent Med, Boston, MA 02215 USA
[26] Univ Melbourne, Dept Food & Agr Syst, Parkville, Vic 3010, Australia
[27] Dept Primary Ind, Biosci Res Div, Bundoora, Vic 3083, Australia
[28] Harvard Univ, TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02115 USA
基金
英国惠康基金;
关键词
GENOME-WIDE ASSOCIATION; ANALYSIS IDENTIFIES 13; GENETIC-RISK; SUSCEPTIBILITY LOCI; COMPLEX TRAITS; MIXED-MODEL; PREDICTION; BREAST; ARCHITECTURE; REGRESSION;
D O I
10.1016/j.ajhg.2015.09.001
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Polygenic risk scores have shown great promise in predicting complex disease risk and will become more accurate as training sample sizes increase. The standard approach for calculating risk scores involves linkage disequilibrium (LD)-based marker pruning and applying a p value threshold to association statistics, but this discards information and can reduce predictive accuracy. We introduce LDpred, a method that infers the posterior mean effect size of each marker by using a prior on effect sizes and LD information from an external reference panel. Theory and simulations show that LDpred outperforms the approach of pruning followed by thresholding, particularly at large sample sizes. Accordingly, predicted R-2 increased from 20.1% to 25.3% in a large schizophrenia dataset and from 9.8% to 12.0% in a large multiple sclerosis dataset. A similar relative improvement in accuracy was observed for three additional large disease datasets and for non-European schizophrenia samples. The advantage of LDpred over existing methods will grow as sample sizes increase.
引用
收藏
页码:576 / 592
页数:17
相关论文
共 64 条
[1]   SparSNP: Fast and memory-efficient analysis of all SNPs for phenotype prediction [J].
Abraham, Gad ;
Kowalczyk, Adam ;
Zobel, Justin ;
Inouye, Michael .
BMC BIOINFORMATICS, 2012, 13
[2]   Hundreds of variants clustered in genomic loci and biological pathways affect human height [J].
Allen, Hana Lango ;
Estrada, Karol ;
Lettre, Guillaume ;
Berndt, Sonja I. ;
Weedon, Michael N. ;
Rivadeneira, Fernando ;
Willer, Cristen J. ;
Jackson, Anne U. ;
Vedantam, Sailaja ;
Raychaudhuri, Soumya ;
Ferreira, Teresa ;
Wood, Andrew R. ;
Weyant, Robert J. ;
Segre, Ayellet V. ;
Speliotes, Elizabeth K. ;
Wheeler, Eleanor ;
Soranzo, Nicole ;
Park, Ju-Hyun ;
Yang, Jian ;
Gudbjartsson, Daniel ;
Heard-Costa, Nancy L. ;
Randall, Joshua C. ;
Qi, Lu ;
Smith, Albert Vernon ;
Maegi, Reedik ;
Pastinen, Tomi ;
Liang, Liming ;
Heid, Iris M. ;
Luan, Jian'an ;
Thorleifsson, Gudmar ;
Winkler, Thomas W. ;
Goddard, Michael E. ;
Lo, Ken Sin ;
Palmer, Cameron ;
Workalemahu, Tsegaselassie ;
Aulchenko, Yurii S. ;
Johansson, Asa ;
Zillikens, M. Carola ;
Feitosa, Mary F. ;
Esko, Tonu ;
Johnson, Toby ;
Ketkar, Shamika ;
Kraft, Peter ;
Mangino, Massimo ;
Prokopenko, Inga ;
Absher, Devin ;
Albrecht, Eva ;
Ernst, Florian ;
Glazer, Nicole L. ;
Hayward, Caroline .
NATURE, 2010, 467 (7317) :832-838
[3]   LD Score regression distinguishes confounding from polygenicity in genome-wide association studies [J].
Bulik-Sullivan, Brendan K. ;
Loh, Po-Ru ;
Finucane, Hilary K. ;
Ripke, Stephan ;
Yang, Jian ;
Patterson, Nick ;
Daly, Mark J. ;
Price, Alkes L. ;
Neale, Benjamin M. .
NATURE GENETICS, 2015, 47 (03) :291-+
[4]   Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls [J].
Burton, Paul R. ;
Clayton, David G. ;
Cardon, Lon R. ;
Craddock, Nick ;
Deloukas, Panos ;
Duncanson, Audrey ;
Kwiatkowski, Dominic P. ;
McCarthy, Mark I. ;
Ouwehand, Willem H. ;
Samani, Nilesh J. ;
Todd, John A. ;
Donnelly, Peter ;
Barrett, Jeffrey C. ;
Davison, Dan ;
Easton, Doug ;
Evans, David ;
Leung, Hin-Tak ;
Marchini, Jonathan L. ;
Morris, Andrew P. ;
Spencer, Chris C. A. ;
Tobin, Martin D. ;
Attwood, Antony P. ;
Boorman, James P. ;
Cant, Barbara ;
Everson, Ursula ;
Hussey, Judith M. ;
Jolley, Jennifer D. ;
Knight, Alexandra S. ;
Koch, Kerstin ;
Meech, Elizabeth ;
Nutland, Sarah ;
Prowse, Christopher V. ;
Stevens, Helen E. ;
Taylor, Niall C. ;
Walters, Graham R. ;
Walker, Neil M. ;
Watkins, Nicholas A. ;
Winzer, Thilo ;
Jones, Richard W. ;
McArdle, Wendy L. ;
Ring, Susan M. ;
Strachan, David P. ;
Pembrey, Marcus ;
Breen, Gerome ;
St Clair, David ;
Caesar, Sian ;
Gordon-Smith, Katherine ;
Jones, Lisa ;
Fraser, Christine ;
Green, Elain K. .
NATURE, 2007, 447 (7145) :661-678
[5]   Evidence for Polygenic Susceptibility to Multiple Sclerosis-The Shape of Things to Come [J].
Bush, William S. ;
Sawcer, Stephen J. ;
de Jager, Philip L. ;
Oksenberg, Jorge R. ;
McCauley, Jacob L. ;
Pericak-Vance, Margaret A. ;
Haines, Jonathan L. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2010, 86 (04) :621-625
[6]   Scalable Variational Inference for Bayesian Variable Selection in Regression, and Its Accuracy in Genetic Association Studies [J].
Carbonetto, Peter ;
Stephens, Matthew .
BAYESIAN ANALYSIS, 2012, 7 (01) :73-107
[7]   Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies [J].
Chatterjee, Nilanjan ;
Wheeler, Bill ;
Sampson, Joshua ;
Hartge, Patricia ;
Chanock, Stephen J. ;
Park, Ju-Hyun .
NATURE GENETICS, 2013, 45 (04) :400-405
[8]   Explicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction [J].
Chen, Chia-Yen ;
Han, Jiali ;
Hunter, David J. ;
Kraft, Peter ;
Price, Alkes L. .
GENETIC EPIDEMIOLOGY, 2015, 39 (06) :427-438
[9]   Accuracy of Predicting the Genetic Risk of Disease Using a Genome-Wide Approach [J].
Daetwyler, Hans D. ;
Villanueva, Beatriz ;
Woolliams, John A. .
PLOS ONE, 2008, 3 (10)
[10]   Predicting genetic predisposition in humans: the promise of whole-genome markers [J].
de los Campos, Gustavo ;
Gianola, Daniel ;
Allison, David B. .
NATURE REVIEWS GENETICS, 2010, 11 (12) :880-886