Fast and scalable ensemble learning method for versatile polygenic risk prediction

被引:0
|
作者
Chen, Tony [1 ]
Zhang, Haoyu [2 ]
Mazumder, Rahul [3 ]
Lin, Xihong [1 ,4 ]
机构
[1] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02215 USA
[2] NCI, Div Canc Epidemiol & Genet, Bethesda, MD 20814 USA
[3] MIT, Sloan Sch Management, Operat Res & Stat Grp, Cambridge, MA 02139 USA
[4] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
关键词
polygenic risk scores; ensemble learning; L0Learn; penalized regression; LINKAGE DISEQUILIBRIUM; SELECTION; REGRESSION; ACCURACY; DISEASE; MODELS; REGULARIZATION; ASSOCIATION; INSIGHTS; SCORE;
D O I
10.1073/pnas.2403210121
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, but existing methods face several limitations, encompassing issues related to computational burden, predictive accuracy, and adaptability to a wide range of genetic architectures. To address these issues, we propose Aggregated L0Learn using Summary- level data (ALL- Sum), a fast and scalable ensemble learning method for computing PRS using summary statistics from genome-wide association studies (GWAS). ALL- Sum leverages a L0L2 penalized regression and ensemble learning across tuning parameters to flexibly model traits with diverse genetic architectures. In extensive large- scale simulations across a wide range of polygenicity and GWAS sample sizes, ALL- Sum consistently outperformed popular alternative methods in terms of prediction accuracy, runtime, and memory usage by 10%, 20- fold, and threefold, respectively, and demonstrated robustness to diverse genetic architectures. We validated the performance of ALL- Sum in real data analysis of 11 complex traits using GWAS summary statistics from nine data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen Biobank, with validation in the UK Biobank. Our results show that on average, ALL- Sum obtained PRS with 25% higher accuracy on average, with 15 times faster computation and half the memory than the current state- of- the- art methods, and had robust performance across a wide range of traits and diseases. Furthermore, our method demonstrates stable prediction when using linkage disequilibrium computed from different data sources. ALL- Sum is available as a user- friendly R software package with publicly available reference data for streamlined analysis.
引用
收藏
页数:9
相关论文
共 50 条
  • [1] An ensemble penalized regression method for multi-ancestry polygenic risk prediction
    Zhang, Jingning
    Zhan, Jianan
    Jin, Jin
    Ma, Cheng
    Zhao, Ruzhang
    O'Connell, Jared
    Jiang, Yunxuan
    Koelsch, Bertram L.
    Zhang, Haoyu
    Chatterjee, Nilanjan
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [2] Learning high-order interactions for polygenic risk prediction
    Massi, Michela C.
    Franco, Nicola R.
    Manzoni, Andrea
    Paganoni, Anna Maria
    Park, Hanla A.
    Hoffmeister, Michael
    Brenner, Hermann
    Chang-Claude, Jenny
    Ieva, Francesca
    Zunino, Paolo
    PLOS ONE, 2023, 18 (02):
  • [3] On polygenic risk scores for complex traits prediction
    Zhao, Bingxin
    Zou, Fei
    BIOMETRICS, 2022, 78 (02) : 499 - 511
  • [4] Polygenic risk scores and breast cancer risk prediction
    Roberts, Eleanor
    Howell, Sacha
    Evans, D. Gareth
    BREAST, 2023, 67 : 71 - 77
  • [5] Transfer learning with false negative control improves polygenic risk prediction
    Jeng, Xinge Jessie
    Hu, Yifei
    Venkat, Vaishnavi
    Lu, Tzu-Pin
    Tzeng, Jung-Ying
    PLOS GENETICS, 2023, 19 (11):
  • [6] A stacked ensemble learning method for customer lifetime value prediction
    Asadi, Nader
    Kazerooni, Mehrdad
    KYBERNETES, 2024, 53 (07) : 2342 - 2360
  • [7] A Stacking Ensemble Learning Framework for Genomic Prediction
    Liang, Mang
    Chang, Tianpeng
    An, Bingxing
    Duan, Xinghai
    Du, Lili
    Wang, Xiaoqiao
    Miao, Jian
    Xu, Lingyang
    Gao, Xue
    Zhang, Lupei
    Li, Junya
    Gao, Huijiang
    FRONTIERS IN GENETICS, 2021, 12
  • [8] Polygenic risk scores in cardiovascular risk prediction: A cohort study and modelling analyses
    Sun, Luanluan
    Pennells, Lisa
    Kaptoge, Stephen
    Nelson, Christopher P.
    Ritchie, Scott C.
    Abraham, Gad
    Arnold, Matthew
    Bell, Steven
    Bolton, Thomas
    Burgess, Stephen
    Dudbridge, Frank
    Guo, Qi
    Sofianopoulou, Eleni
    Stevens, David
    Thompson, John R.
    Butterworth, Adam S.
    Wood, Angela
    Danesh, John
    Samani, Nilesh J.
    Inouye, Michael
    Di Angelantonio, Emanuele
    PLOS MEDICINE, 2021, 18 (01)
  • [9] Polygenic risk scores for the prediction of cardiometabolic disease
    O'Sullivan, Jack W.
    Ashley, Euan A.
    Elliott, Perry M.
    EUROPEAN HEART JOURNAL, 2023, 44 (02) : 89 - 99
  • [10] An improved ensemble learning machine for biological activity prediction of tyrosine kinase inhibitors
    Tavakoli, Hossein
    Ghasemi, Jahan B.
    JOURNAL OF CHEMOMETRICS, 2015, 29 (04) : 213 - 223