Prior-Preconditioned Conjugate Gradient Method for Accelerated Gibbs Sampling in "Large n, Large p'' Bayesian Sparse Regression

被引：12

作者：

Nishimura, Akihiko ^{[1
]}

Suchard, Marc A. ^{[2
]}

机构：

[1] Johns Hopkins Univ, Dept Biostat, Baltimore, MD 21205 USA

[2] Univ Calif Los Angeles, Dept Biomath Biostat & Human Genet, Los Angeles, CA USA

来源：

JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION | 2023年 / 118卷 / 544期

基金：

美国国家科学基金会; 美国国家卫生研究院;

关键词：

Big data; Conjugate gradient; Markov chain Monte Carlo; Numerical linear algebra; Sparse matrix; Variable selection; VARIABLE SELECTION; HORSESHOE; INFERENCE; ITERATIONS; EQUATIONS; MODELS;

D O I：

10.1080/01621459.2022.2057859

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

In a modern observational study based on healthcare databases, the number of observations and of predictors typically range in the order of 10(5)-10(6) and of 10(4) -10(5). Despite the large sample size, data rarely provide sufficient information to reliably estimate such a large number of parameters. Sparse regression techniques provide potential solutions, one notable approach being the Bayesian method based on shrinkage priors. In the "large n and large p"setting, however, the required posterior computation encounters a bottleneck at repeated sampling from a high-dimensional Gaussian distribution, whose precision matrix Phi is expensive to compute and factorize. In this article, we present a novel algorithm to speed up this bottleneck based on the following observation: We can cheaply generate a random vector b such that the solution to the linear system Phi beta = b has the desired Gaussian distribution. We can then solve the linear system by the conjugate gradient (CG) algorithm through matrix-vector multiplications by Phi; this involves no explicit factorization or calculation of Phi itself. Rapid convergence of CG in this context is guaranteed by the theory of prior-preconditioning we develop. We apply our algorithm to a clinically relevant large-scale observational study with n = 72,489 patients and p = 22,175 clinical covariates, designed to assess the relative risk of adverse events from two alternative blood anti-coagulants. Our algorithm demonstrates an order of magnitude speed-up in posterior inference, in our case cutting the computation time from two weeks to less than a day. Supplementary materials for this article are available online.

引用

页码：2468 / 2481

页数：14

共 4 条

[1] A new accelerated conjugate gradient method for large-scale unconstrained optimization
Chen, Yuting
Cao, Mingyuan
Yang, Yueting
JOURNAL OF INEQUALITIES AND APPLICATIONS, 2019, 2019 (01)
[2] A new accelerated conjugate gradient method for large-scale unconstrained optimization
Yuting Chen
Mingyuan Cao
Yueting Yang
Journal of Inequalities and Applications, 2019
[3] Bayesian nonlinear regression for large p small n problems
Chakraborty, Sounak
Ghosh, Malay
Mallick, Bani K.
JOURNAL OF MULTIVARIATE ANALYSIS, 2012, 108 : 28 - 40
[4] Gradient boosting: A computationally efficient alternative to Markov chain Monte Carlo sampling for fitting large Bayesian spatio-temporal binomial regression models
Huang, Rongjie
McMahan, Christopher
Herrin, Brian
McLain, Alexander
Cai, Bo
Self, Stella
INFECTIOUS DISEASE MODELLING, 2025, 10 (01) : 189 - 200

← 1 →