Fast and accurate Bayesian polygenic risk modeling with variational inference

被引:12
作者
Zabad, Shadi [1 ]
Gravel, Simon [2 ]
Li, Yue [1 ]
机构
[1] McGill Univ, Sch Comp Sci, Montreal, PQ, Canada
[2] McGill Univ, Dept Human Genet, Montreal, PQ, Canada
基金
加拿大自然科学与工程研究理事会;
关键词
GENOME-WIDE ASSOCIATION; HUMAN COMPLEX TRAITS; UK BIOBANK; VARIABLE SELECTION; MIXED-MODEL; PREDICTION; SCORES; RARE; REGRESSION; VARIANTS;
D O I
10.1016/j.ajhg.2023.03.009
中图分类号
Q3 [遗传学];
学科分类号
071007 ; 090102 ;
摘要
The advent of large-scale genome-wide association studies (GWASs) has motivated the development of statistical methods for phenotype prediction with single-nucleotide polymorphism (SNP) array data. These polygenic risk score (PRS) methods use a multiple linear regres-sion framework to infer joint effect sizes of all genetic variants on the trait. Among the subset of PRS methods that operate on GWAS summary statistics, sparse Bayesian methods have shown competitive predictive ability. However, most existing Bayesian approaches employ Markov chain Monte Carlo (MCMC) algorithms, which are computationally inefficient , do not scale favorably to higher di-mensions, for posterior inference. Here, we introduce variational inference of polygenic risk scores (VIPRS), a Bayesian summary statis-tics-based PRS method that utilizes variational inference techniques to approximate the posterior distribution for the effect sizes. Our experiments with 36 simulation configurations and 12 real phenotypes from the UK Biobank dataset demonstrated that VIPRS is consis-tently competitive with the state-of-the-art in prediction accuracy while being more than twice as fast as popular MCMC-based ap-proaches. This performance advantage is robust across a variety of genetic architectures, SNP heritabilities , independent GWAS co-horts. In addition to its competitive accuracy on the "White British"samples, VIPRS showed improved transferability when applied to other ethnic groups, with up to 1.7-fold increase in R2 among individuals of Nigerian ancestry for low-density lipoprotein (LDL) cholesterol. To illustrate its scalability, we applied VIPRS to a dataset of 9.6 million genetic markers, which conferred further improvements in prediction accuracy for highly polygenic traits, such as height.
引用
收藏
页码:741 / 761
页数:22
相关论文
共 50 条
[41]   A fast asynchronous Markov chain Monte Carlo sampler for sparse Bayesian inference [J].
Atchade, Yves ;
Wang, Liwei .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2024, 85 (05) :1492-1516
[42]   The General Projected Normal Distribution of Arbitrary Dimension: Modeling and Bayesian Inference [J].
Hernandez-Stumpfhauser, Daniel ;
Breidt, F. Jay ;
van der Woerd, Mark J. .
BAYESIAN ANALYSIS, 2017, 12 (01) :113-133
[43]   Improving the computation efficiency of polygenic risk score modeling: faster in Julia [J].
Faucon, Annika ;
Samaroo, Julian ;
Ge, Tian ;
Davis, Lea K. ;
Cox, Nancy J. ;
Tao, Ran ;
Shuey, Megan M. .
LIFE SCIENCE ALLIANCE, 2022, 5 (12)
[44]   Explicit Modeling of Ancestry Improves Polygenic Risk Scores and BLUP Prediction [J].
Chen, Chia-Yen ;
Han, Jiali ;
Hunter, David J. ;
Kraft, Peter ;
Price, Alkes L. .
GENETIC EPIDEMIOLOGY, 2015, 39 (06) :427-438
[45]   Fast and Accurate Modeling of Power Converter Availability for Adequacy Assessment [J].
Davoodi, Amirali ;
Peyghami, Saeed ;
Yang, Yongheng ;
Dragicevic, Tomislav ;
Blaabjerg, Frede .
IEEE TRANSACTIONS ON POWER DELIVERY, 2021, 36 (06) :3992-3995
[46]   Coefficient tree regression: fast, accurate and interpretable predictive modeling [J].
Surer, Ozge ;
Apley, Daniel W. ;
Malthouse, Edward C. .
MACHINE LEARNING, 2024, 113 (07) :4723-4759
[47]   Variational Bayesian inference for bipartite mixed-membership stochastic block model with applications to collaborative filtering [J].
Liu, Jie ;
Ye, Zifeng ;
Chen, Kun ;
Zhang, Panpan .
COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2024, 189
[48]   Bayesian LSTM With Stochastic Variational Inference for Estimating Model Uncertainty in Process-Based Hydrological Models [J].
Li, Dayang ;
Marshall, Lucy ;
Liang, Zhongmin ;
Sharma, Ashish ;
Zhou, Yan .
WATER RESOURCES RESEARCH, 2021, 57 (09)
[49]   Meta-Kriging: Scalable Bayesian Modeling and Inference for Massive Spatial Datasets [J].
Guhaniyogi, Rajarshi ;
Banerjee, Sudipto .
TECHNOMETRICS, 2018, 60 (04) :430-444
[50]   FacPad: Bayesian sparse factor modeling for the inference of pathways responsive to drug treatment [J].
Ma, Haisu ;
Zhao, Hongyu .
BIOINFORMATICS, 2012, 28 (20) :2662-2670