Fast and scalable ensemble learning method for versatile polygenic risk prediction

被引:0
作者
Chen, Tony [1 ]
Zhang, Haoyu [2 ]
Mazumder, Rahul [3 ]
Lin, Xihong [1 ,4 ]
机构
[1] Harvard TH Chan Sch Publ Hlth, Dept Biostat, Boston, MA 02215 USA
[2] NCI, Div Canc Epidemiol & Genet, Bethesda, MD 20814 USA
[3] MIT, Sloan Sch Management, Operat Res & Stat Grp, Cambridge, MA 02139 USA
[4] Harvard Univ, Dept Stat, Cambridge, MA 02138 USA
关键词
polygenic risk scores; ensemble learning; L0Learn; penalized regression; LINKAGE DISEQUILIBRIUM; SELECTION; REGRESSION; ACCURACY; DISEASE; MODELS; REGULARIZATION; ASSOCIATION; INSIGHTS; COMMON;
D O I
10.1073/pnas.2403210121
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Polygenic risk scores (PRS) enhance population risk stratification and advance personalized medicine, but existing methods face several limitations, encompassing issues related to computational burden, predictive accuracy, and adaptability to a wide range of genetic architectures. To address these issues, we propose Aggregated L0Learn using Summary- level data (ALL- Sum), a fast and scalable ensemble learning method for computing PRS using summary statistics from genome-wide association studies (GWAS). ALL- Sum leverages a L0L2 penalized regression and ensemble learning across tuning parameters to flexibly model traits with diverse genetic architectures. In extensive large- scale simulations across a wide range of polygenicity and GWAS sample sizes, ALL- Sum consistently outperformed popular alternative methods in terms of prediction accuracy, runtime, and memory usage by 10%, 20- fold, and threefold, respectively, and demonstrated robustness to diverse genetic architectures. We validated the performance of ALL- Sum in real data analysis of 11 complex traits using GWAS summary statistics from nine data sources, including the Global Lipids Genetics Consortium, Breast Cancer Association Consortium, and FinnGen Biobank, with validation in the UK Biobank. Our results show that on average, ALL- Sum obtained PRS with 25% higher accuracy on average, with 15 times faster computation and half the memory than the current state- of- the- art methods, and had robust performance across a wide range of traits and diseases. Furthermore, our method demonstrates stable prediction when using linkage disequilibrium computed from different data sources. ALL- Sum is available as a user- friendly R software package with publicly available reference data for streamlined analysis.
引用
收藏
页数:9
相关论文
共 50 条
[41]   Cardiac surgery risk prediction using ensemble machine learning to incorporate legacy risk scores: A benchmarking study [J].
Dong, Tim ;
Sinha, Shubhra ;
Zhai, Ben ;
Fudulu, Daniel P. ;
Chan, Jeremy ;
Narayan, Pradeep ;
Judge, Andy ;
Caputo, Massimo ;
Dimagli, Arnaldo ;
Benedetto, Umberto ;
Angelini, Gianni D. .
DIGITAL HEALTH, 2023, 9
[42]   Ensemble learning with label proportions for bankruptcy prediction [J].
Chen, Zhensong ;
Chen, Wei ;
Shi, Yong .
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 146
[43]   Effective Credit Risk Prediction Using Ensemble Classifiers With Model Explanation [J].
Aruleba, Idowu ;
Sun, Yanxia .
IEEE ACCESS, 2024, 12 :115015-115025
[44]   Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies [J].
Chatterjee, Nilanjan ;
Wheeler, Bill ;
Sampson, Joshua ;
Hartge, Patricia ;
Chanock, Stephen J. ;
Park, Ju-Hyun .
NATURE GENETICS, 2013, 45 (04) :400-405
[45]   Deep learning-based polygenic risk analysis for Alzheimer's disease prediction [J].
Zhou, Xiaopu ;
Chen, Yu ;
Ip, Fanny C. F. ;
Jiang, Yuanbing ;
Cao, Han ;
Lv, Ge ;
Zhong, Huan ;
Chen, Jiahang ;
Ye, Tao ;
Chen, Yuewen ;
Zhang, Yulin ;
Ma, Shuangshuang ;
Lo, Ronnie M. N. ;
Tong, Estella P. S. ;
Mok, Vincent C. T. ;
Kwok, Timothy C. Y. ;
Guo, Qihao ;
Mok, Kin Y. ;
Shoai, Maryam ;
Hardy, John ;
Chen, Lei ;
Fu, Amy K. Y. ;
Ip, Nancy Y. .
COMMUNICATIONS MEDICINE, 2023, 3 (01)
[46]   A Text Mining and Ensemble Learning Based Approach for Credit Risk Prediction [J].
Mao, Yang ;
Liu, Shifeng ;
Gong, Daqing .
TEHNICKI VJESNIK-TECHNICAL GAZETTE, 2023, 30 (01) :138-147
[47]   HeartEnsembleNet: An Innovative Hybrid Ensemble Learning Approach for Cardiovascular Risk Prediction [J].
Zaidi, Syed Ali Jafar ;
Ghafoor, Attia ;
Kim, Jun ;
Abbas, Zeeshan ;
Lee, Seung Won .
HEALTHCARE, 2025, 13 (05)
[48]   Prediction of Cervical Cancer Basing on Risk Factors using Ensemble Learning [J].
Ahishakiye, Emmanuel ;
Wario, Ruth ;
Mwangi, Waweru ;
Taremwa, Danison .
2020 IST-AFRICA CONFERENCE (IST-AFRICA), 2020,
[49]   A Novel Multi-Model Stacking Ensemble Learning Method for Metro Traction Energy Prediction [J].
Lin, Shan ;
Nong, Xingzhong ;
Luo, Jianqiang ;
Wang, Chen'en .
IEEE ACCESS, 2022, 10 :129231-129244
[50]   Multi-sentiment fusion for stock price crash risk prediction using an interpretable ensemble learning method [J].
Deng, Shangkun ;
Luo, Qunfang ;
Zhu, Yingke ;
Ning, Hong ;
Yu, Yiting ;
Gao, Yizhuo ;
Shen, Quan ;
Shimada, Tatsuro .
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2024, 135