Fast and scalable ensemble learning method for versatile polygenic risk prediction

被引:0
作者
Chen, Tony [1 ]
Zhang, Haoyu [2 ]
Mazumder, Rahul [3 ]
Lin, Xihong [1 ,4 ]
机构
[1] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02215 USA
[2] NCI, Div Canc Epidemiol & Genet, Bethesda, MD 20814 USA
[3] MIT, Sloan Sch Management, Operat Res & Stat Grp, Cambridge, MA 02139 USA
[4] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
关键词
polygenic risk scores; ensemble learning; L0Learn; penalized regression; LINKAGE DISEQUILIBRIUM; SELECTION; REGRESSION; ACCURACY; DISEASE; MODELS; REGULARIZATION; ASSOCIATION; INSIGHTS; COMMON;
D O I
10.1073/pnas.2403210121
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, but existing methods face several limitations, encompassing issues related to computational burden, predictive accuracy, and adaptability to a wide range of genetic architectures. To address these issues, we propose Aggregated L0Learn using Summary- level data (ALL- Sum), a fast and scalable ensemble learning method for computing PRS using summary statistics from genome-wide association studies (GWAS). ALL- Sum leverages a L0L2 penalized regression and ensemble learning across tuning parameters to flexibly model traits with diverse genetic architectures. In extensive large- scale simulations across a wide range of polygenicity and GWAS sample sizes, ALL- Sum consistently outperformed popular alternative methods in terms of prediction accuracy, runtime, and memory usage by 10%, 20- fold, and threefold, respectively, and demonstrated robustness to diverse genetic architectures. We validated the performance of ALL- Sum in real data analysis of 11 complex traits using GWAS summary statistics from nine data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen Biobank, with validation in the UK Biobank. Our results show that on average, ALL- Sum obtained PRS with 25% higher accuracy on average, with 15 times faster computation and half the memory than the current state- of- the- art methods, and had robust performance across a wide range of traits and diseases. Furthermore, our method demonstrates stable prediction when using linkage disequilibrium computed from different data sources. ALL- Sum is available as a user- friendly R software package with publicly available reference data for streamlined analysis.
引用
收藏
页数:9
相关论文
共 83 条
[1]   Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps [J].
Adeyemo, Adebowale ;
Balaconis, Mary K. ;
Darnes, Deanna R. ;
Fatumo, Segun ;
Granados Moreno, Palmira ;
Hodonsky, Chani J. ;
Inouye, Michael ;
Kanai, Masahiro ;
Kato, Kazuto ;
Knoppers, Bartha M. ;
Lewis, Anna C. F. ;
Martin, Alicia R. ;
McCarthy, Mark I. ;
Meyer, Michelle N. ;
Okada, Yukinori ;
Richards, J. Brent ;
Richter, Lucas ;
Ripatti, Samuli ;
Rotimi, Charles N. ;
Sanderson, Saskia C. ;
Sturm, Amy C. ;
Verdugo, Ricardo A. ;
Widen, Elisabeth ;
Willer, Cristen J. ;
Wojcik, Genevieve L. ;
Zhou, Alicia .
NATURE MEDICINE, 2021, 27 (11) :1876-1884
[2]   Leveraging both individual-level genetic data and GWAS summary statistics increases polygenic prediction [J].
Albinana, Clara ;
Grove, Jakob ;
McGrath, John J. ;
Agerbo, Esben ;
Wray, Naomi R. ;
Bulik, Cynthia M. ;
Nordentoft, Merete ;
Hougaard, David M. ;
Werge, Thomas ;
Borglum, Anders D. ;
Mortensen, Preben Bo ;
Prive, Florian ;
Vilhjalmsson, Bjarni J. .
AMERICAN JOURNAL OF HUMAN GENETICS, 2021, 108 (06) :1001-1011
[3]   Integrating common and rare genetic variation in diverse human populations [J].
Altshuler, David M. ;
Gibbs, Richard A. ;
Peltonen, Leena ;
Dermitzakis, Emmanouil ;
Schaffner, Stephen F. ;
Yu, Fuli ;
Bonnen, Penelope E. ;
de Bakker, Paul I. W. ;
Deloukas, Panos ;
Gabriel, Stacey B. ;
Gwilliam, Rhian ;
Hunt, Sarah ;
Inouye, Michael ;
Jia, Xiaoming ;
Palotie, Aarno ;
Parkin, Melissa ;
Whittaker, Pamela ;
Chang, Kyle ;
Hawes, Alicia ;
Lewis, Lora R. ;
Ren, Yanru ;
Wheeler, David ;
Muzny, Donna Marie ;
Barnes, Chris ;
Darvishi, Katayoon ;
Hurles, Matthew ;
Korn, Joshua M. ;
Kristiansson, Kati ;
Lee, Charles ;
McCarroll, Steven A. ;
Nemesh, James ;
Keinan, Alon ;
Montgomery, Stephen B. ;
Pollack, Samuela ;
Price, Alkes L. ;
Soranzo, Nicole ;
Gonzaga-Jauregui, Claudia ;
Anttila, Verneri ;
Brodeur, Wendy ;
Daly, Mark J. ;
Leslie, Stephen ;
McVean, Gil ;
Moutsianas, Loukas ;
Nguyen, Huy ;
Zhang, Qingrun ;
Ghori, Mohammed J. R. ;
McGinnis, Ralph ;
McLaren, William ;
Takeuchi, Fumihiko ;
Grossman, Sharon R. .
NATURE, 2010, 467 (7311) :52-58
[4]   Approximately independent linkage disequilibrium blocks in human populations [J].
Berisa, Tomaz ;
Pickrell, Joseph K. .
BIOINFORMATICS, 2016, 32 (02) :283-285
[5]   BEST SUBSET SELECTION VIA A MODERN OPTIMIZATION LENS [J].
Bertsimas, Dimitris ;
King, Angela ;
Mazumder, Rahul .
ANNALS OF STATISTICS, 2016, 44 (02) :813-852
[6]   Strategies for Enriching Variant Coverage in Candidate Disease Loci on a Multiethnic Genotyping Array [J].
Bien, Stephanie A. ;
Wojcik, Genevieve L. ;
Zubair, Niha ;
Gignoux, Christopher R. ;
Martin, Alicia R. ;
Kocarnik, Jonathan M. ;
Martin, Lisa W. ;
Buyske, Steven ;
Haessler, Jeffrey ;
Walker, Ryan W. ;
Cheng, Iona ;
Graff, Mariaelisa ;
Xia, Lucy ;
Franceschini, Nora ;
Matise, Tara ;
James, Regina ;
Hindorff, Lucia ;
Le Marchand, Loic ;
North, Kari E. ;
Haiman, Christopher A. ;
Peters, Ulrike ;
Loos, Ruth J. F. ;
Kooperberg, Charles L. ;
Bustamante, Carlos D. ;
Kenny, Eimear E. ;
Carlson, Christopher S. .
PLOS ONE, 2016, 11 (12)
[7]   LD Score regression distinguishes confounding from polygenicity in genome-wide association studies [J].
Bulik-Sullivan, Brendan K. ;
Loh, Po-Ru ;
Finucane, Hilary K. ;
Ripke, Stephan ;
Yang, Jian ;
Patterson, Nick ;
Daly, Mark J. ;
Price, Alkes L. ;
Neale, Benjamin M. .
NATURE GENETICS, 2015, 47 (03) :291-+
[8]   A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits [J].
Cai, Mingxuan ;
Xiao, Jiashun ;
Zhang, Shunkang ;
Wan, Xiang ;
Zhao, Hongyu ;
Chen, Gang ;
Yang, Can .
AMERICAN JOURNAL OF HUMAN GENETICS, 2021, 108 (04) :632-655
[9]   Second-generation PLINK: rising to the challenge of larger and richer datasets [J].
Chang, Christopher C. ;
Chow, Carson C. ;
Tellier, Laurent C. A. M. ;
Vattikuti, Shashaank ;
Purcell, Shaun M. ;
Lee, James J. .
GIGASCIENCE, 2015, 4
[10]   Developing and evaluating polygenic risk prediction models for stratified disease prevention [J].
Chatterjee, Nilanjan ;
Shi, Jianxin ;
Garcia-Closas, Montserrat .
NATURE REVIEWS GENETICS, 2016, 17 (07) :392-406