Evaluation of polygenic scoring methods in five biobanks shows larger variation between biobanks than methods and finds benefits of ensemble learning

被引:8
作者
Monti, Remo [1 ,2 ]
Eick, Lisa [3 ]
Hudjashov, Georgi [4 ]
Lall, Kristi [4 ]
Kanoni, Stavroula [5 ]
Wolford, Brooke N. [6 ]
Wingfield, Benjamin [7 ]
Pain, Oliver [8 ,9 ]
Wharrie, Sophie [10 ]
Jermy, Bradley [3 ]
McMahon, Aoife [7 ]
Hartonen, Tuomo [3 ]
Heyne, Henrike [1 ]
Mars, Nina [3 ,11 ,12 ]
Lambert, Samuel [13 ]
Hveem, Kristian [13 ]
Inouye, Michael [14 ,15 ,16 ,17 ,18 ,19 ,20 ]
van Heel, David A. [21 ]
Magi, Reedik [1 ,2 ,19 ]
Marttinen, Pekka [3 ,10 ,11 ]
Ripatti, Samuli [22 ]
Ganna, Andrea [23 ]
Lippert, Christoph [1 ,24 ,25 ,26 ,27 ]
机构
[1] Univ Potsdam, Hasso Plattner Inst, Digital Engn Fac, Potsdam, Germany
[2] Berlin Inst Med Syst Biol, Max Delbruck Ctr Mol Med Helmholtz Assoc, Berlin, Germany
[3] Univ Helsinki, Inst Mol Med Finland, Helsinki Inst Life Sci, Helsinki, Finland
[4] Univ Tartu, Inst Genom, Estonian Genome Ctr, Tartu, Estonia
[5] Queen Mary Univ London, William Harvey Res Inst, Barts & London Sch Med & Dent, London, England
[6] Norwegian Univ Sci & Technol, Fac Med & Hlth, KG Jebsen Ctr Genet Epidemiol, Dept Publ Hlth & Nursing, Trondheim, Norway
[7] European Bioinformat Inst, European Mol Biol Lab, Wellcome Genome Campus, Cambridge, England
[8] Kings Coll London, Dept Basic & Clin Neurosci, Maurice Wohl Clin Neurosci Inst, London, England
[9] Kings Coll London, Inst Psychiat, Psychol & Neurosci, London, England
[10] Aalto Univ, Dept Comp Sci, Espoo, Finland
[11] Massachusetts Gen Hosp, Analyt & Translat Genet Unit, Boston, MA USA
[12] Broad Inst MIT & Harvard, Stanley Ctr Psychiat Res & Program Med & Populat G, Cambridge, MA USA
[13] Levanger Hosp, Nord Trondelag Hosp Trust, Levanger, Norway
[14] Univ Cambridge, Dept Publ Hlth & Primary Care, Cambridge Baker Syst Genom Initiat, Cambridge, England
[15] Baker Heart & Diabet Inst, Cambridge Baker Syst Genom Initiat, Melbourne, Vic, Australia
[16] Univ Cambridge, British Heart Fdn Cardiovasc Epidemiol Unit, Dept Publ Hlth & Primary Care, Cambridge, England
[17] Univ Cambridge, Victor Phillip Dahdaleh Heart & Lung Res Inst, Cambridge, England
[18] Univ Cambridge, British Heart Fdn Cambridge Ctr Res Excellence, Sch Clin Med, Cambridge, England
[19] Hlth Data Res UK Cambridge, Wellcome Genome Campus, Cambridge, England
[20] Univ Cambridge, Cambridge, England
[21] Queen Mary Univ London, Blizard Inst, London, England
[22] Univ Helsinki, Dept Publ Hlth, Helsinki, Finland
[23] Massachusetts Gen Hosp, Cambridge, MA USA
[24] Broad Inst MIT & Harvard, Cambridge, MA USA
[25] Icahn Sch Med Mt Sinai, Windreich Dept Artificial Intelligence & Human Hlt, New York, NY USA
[26] Icahn Sch Med Mt Sinai, Hasso Plattner Inst Digital Hlth Mt Sinai, New York, NY USA
[27] Icahn Sch Med Mt Sinai, Dept Diagnost Mol & Intervent Radiol, New York, NY USA
关键词
GENOME-WIDE ASSOCIATION; RISK SCORES; PREDICTION; EQUATION; INSIGHTS; CATALOG; DISEASE; MODELS; COMMON;
D O I
10.1016/j.ajhg.2024.06.003
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
Methods of estimating polygenic scores (PGSs) from genome-wide association studies are increasingly utilized. However, independent method evaluation is lacking, and method comparisons are often limited. Here, we evaluate polygenic scores derived via seven methods in five biobank studies (totaling about 1.2 million participants) across 16 diseases and quantitative traits, building on a reference-standardized framework. We conducted meta-analyses to quantify the effects of method choice, hyperparameter tuning, method ensembling, and the target biobank on PGS performance. We found that no single method consistently outperformed all others. PGS effect sizes were more variable between biobanks than between methods within biobanks when methods were well tuned. Differences between methods were largest for the two investigated autoimmune diseases, seropositive rheumatoid arthritis and type 1 diabetes. For most methods, cross-validation was more reliable for tuning hyperparameters than automatic tuning (without the use of target data). For a given target phenotype, elastic net models combining PGS across methods (ensemble PGS) tuned in the UK Biobank provided consistent, high, and cross-biobank transferable performance, increasing PGS effect sizes ( b coefficients) by a median of 5.0% relative to LDpred2 and MegaPRS (the two best-performing single methods when tuned with cross-validation). Our interactively browsable online-results and open-source workflow prspipe provide a rich resource and reference for the analysis of polygenic scoring methods across biobanks.
引用
收藏
页码:1431 / 1447
页数:18
相关论文
共 67 条
[1]   Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps [J].
Adeyemo, Adebowale ;
Balaconis, Mary K. ;
Darnes, Deanna R. ;
Fatumo, Segun ;
Granados Moreno, Palmira ;
Hodonsky, Chani J. ;
Inouye, Michael ;
Kanai, Masahiro ;
Kato, Kazuto ;
Knoppers, Bartha M. ;
Lewis, Anna C. F. ;
Martin, Alicia R. ;
McCarthy, Mark I. ;
Meyer, Michelle N. ;
Okada, Yukinori ;
Richards, J. Brent ;
Richter, Lucas ;
Ripatti, Samuli ;
Rotimi, Charles N. ;
Sanderson, Saskia C. ;
Sturm, Amy C. ;
Verdugo, Ricardo A. ;
Widen, Elisabeth ;
Willer, Cristen J. ;
Wojcik, Genevieve L. ;
Zhou, Alicia .
NATURE MEDICINE, 2021, 27 (11) :1876-1884
[2]   Multi-PGS enhances polygenic prediction by combining 937 polygenic scores [J].
Albinana, Clara ;
Zhu, Zhihong ;
Schork, Andrew J. ;
Ingason, Andres ;
Aschard, Hugues ;
Brikell, Isabell ;
Bulik, Cynthia M. ;
Petersen, Liselotte V. ;
Agerbo, Esben ;
Grove, Jakob ;
Nordentoft, Merete ;
Hougaard, David M. ;
Werge, Thomas ;
Borglum, Anders D. ;
Mortensen, Preben Bo ;
McGrath, John J. ;
Neale, Benjamin M. ;
Prive, Florian ;
Vilhjalmsson, Bjarni J. .
NATURE COMMUNICATIONS, 2023, 14 (01)
[3]   A global reference for human genetic variation [J].
Altshuler, David M. ;
Durbin, Richard M. ;
Abecasis, Goncalo R. ;
Bentley, David R. ;
Chakravarti, Aravinda ;
Clark, Andrew G. ;
Donnelly, Peter ;
Eichler, Evan E. ;
Flicek, Paul ;
Gabriel, Stacey B. ;
Gibbs, Richard A. ;
Green, Eric D. ;
Hurles, Matthew E. ;
Knoppers, Bartha M. ;
Korbel, Jan O. ;
Lander, Eric S. ;
Lee, Charles ;
Lehrach, Hans ;
Mardis, Elaine R. ;
Marth, Gabor T. ;
McVean, Gil A. ;
Nickerson, Deborah A. ;
Wang, Jun ;
Wilson, Richard K. ;
Boerwinkle, Eric ;
Doddapaneni, Harsha ;
Han, Yi ;
Korchina, Viktoriya ;
Kovar, Christie ;
Lee, Sandra ;
Muzny, Donna ;
Reid, Jeffrey G. ;
Zhu, Yiming ;
Chang, Yuqi ;
Feng, Qiang ;
Fang, Xiaodong ;
Guo, Xiaosen ;
Jian, Min ;
Jiang, Hui ;
Jin, Xin ;
Lan, Tianming ;
Li, Guoqing ;
Li, Jingxiang ;
Li, Yingrui ;
Liu, Shengmao ;
Liu, Xiao ;
Lu, Yao ;
Ma, Xuedi ;
Tang, Meifang ;
Wang, Bo .
NATURE, 2015, 526 (7571) :68-+
[4]   Integrating common and rare genetic variation in diverse human populations [J].
Altshuler, David M. ;
Gibbs, Richard A. ;
Peltonen, Leena ;
Dermitzakis, Emmanouil ;
Schaffner, Stephen F. ;
Yu, Fuli ;
Bonnen, Penelope E. ;
de Bakker, Paul I. W. ;
Deloukas, Panos ;
Gabriel, Stacey B. ;
Gwilliam, Rhian ;
Hunt, Sarah ;
Inouye, Michael ;
Jia, Xiaoming ;
Palotie, Aarno ;
Parkin, Melissa ;
Whittaker, Pamela ;
Chang, Kyle ;
Hawes, Alicia ;
Lewis, Lora R. ;
Ren, Yanru ;
Wheeler, David ;
Muzny, Donna Marie ;
Barnes, Chris ;
Darvishi, Katayoon ;
Hurles, Matthew ;
Korn, Joshua M. ;
Kristiansson, Kati ;
Lee, Charles ;
McCarroll, Steven A. ;
Nemesh, James ;
Keinan, Alon ;
Montgomery, Stephen B. ;
Pollack, Samuela ;
Price, Alkes L. ;
Soranzo, Nicole ;
Gonzaga-Jauregui, Claudia ;
Anttila, Verneri ;
Brodeur, Wendy ;
Daly, Mark J. ;
Leslie, Stephen ;
McVean, Gil ;
Moutsianas, Loukas ;
Nguyen, Huy ;
Zhang, Qingrun ;
Ghori, Mohammed J. R. ;
McGinnis, Ralph ;
McLaren, William ;
Takeuchi, Fumihiko ;
Grossman, Sharon R. .
NATURE, 2010, 467 (7311) :52-58
[5]   Cohort Profile Update: The HUNT Study, Norway [J].
Asvold, Bjorn Olav ;
Langhammer, Arnulf ;
Rehn, Tommy Aune ;
Kjelvik, Grete ;
Grontvedt, Trond Viggo ;
Sorgjerd, Elin Pettersen ;
Fenstad, Jorn Soberg ;
Heggland, Jon ;
Holmen, Oddgeir ;
Stuifbergen, Maria C. ;
Vikjord, Sigrid Anna Aalberg ;
Brumpton, Ben M. ;
Skjellegrind, Havard Kjesbu ;
Thingstad, Pernille ;
Sund, Erik R. ;
Selbaek, Geir ;
Mork, Paul Jarle ;
Rangul, Vegar ;
Hveem, Kristian ;
Naess, Marit ;
Krokstad, Steinar .
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY, 2023, 52 (01) :E80-E91
[6]   The UK Biobank resource with deep phenotyping and genomic data [J].
Bycroft, Clare ;
Freeman, Colin ;
Petkova, Desislava ;
Band, Gavin ;
Elliott, Lloyd T. ;
Sharp, Kevin ;
Motyer, Allan ;
Vukcevic, Damjan ;
Delaneau, Olivier ;
O'Connell, Jared ;
Cortes, Adrian ;
Welsh, Samantha ;
Young, Alan ;
Effingham, Mark ;
McVean, Gil ;
Leslie, Stephen ;
Allen, Naomi ;
Donnelly, Peter ;
Marchini, Jonathan .
NATURE, 2018, 562 (7726) :203-+
[7]   A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits [J].
Cai, Mingxuan ;
Xiao, Jiashun ;
Zhang, Shunkang ;
Wan, Xiang ;
Zhao, Hongyu ;
Chen, Gang ;
Yang, Can .
AMERICAN JOURNAL OF HUMAN GENETICS, 2021, 108 (04) :632-655
[8]   Second-generation PLINK: rising to the challenge of larger and richer datasets [J].
Chang, Christopher C. ;
Chow, Carson C. ;
Tellier, Laurent C. A. M. ;
Vattikuti, Shashaank ;
Purcell, Shaun M. ;
Lee, James J. .
GIGASCIENCE, 2015, 4
[9]  
Charrad M, 2014, J STAT SOFTW, V61, P1
[10]   Modeling Dependent Effect Sizes With Three-Level Meta-Analyses: A Structural Equation Modeling Approach [J].
Cheung, Mike W-L .
PSYCHOLOGICAL METHODS, 2014, 19 (02) :211-229