An algorithm for distributed Bayesian inference

被引：4

作者：

Shyamalkumar, Nariankadu D. ^{[1
]}

Srivastava, Sanvesh ^{[1
]}

机构：

[1] Univ Iowa, Dept Stat & Actuarial Sci, Iowa City, IA 52242 USA

来源：

STAT | 2022年 / 11卷 / 01期

基金：

美国国家科学基金会;

关键词：

data augmentation; distributed computing; divide-and-conquer; location-scatter family; Monte Carlo computations; Wasserstein distance; BARYCENTERS; MODELS;

D O I：

10.1002/sta4.432

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

Monte Carlo algorithms, such as Markov chain Monte Carlo (MCMC) and Hamiltonian Monte Carlo (HMC), are routinely used for Bayesian inference; however, these algorithms are prohibitively slow in massive data settings because they require multiple passes through the full data in every iteration. Addressing this problem, we develop a scalable extension of these algorithms using the divide-and-conquer (D&C) technique that divides the data into a sufficiently large number of subsets, draws parameters in parallel on the subsets using a powered likelihood and produces Monte Carlo draws of the parameter by combining parameter draws obtained from each subset. The combined parameter draws play the role of draws from the original sampling algorithm. Our main contributions are twofold. First, we demonstrate through diverse simulated and real data analyses focusing on generalized linear models (GLMs) that our distributed algorithm delivers comparable results as the current state-of-the-art D&C algorithms in terms of statistical accuracy and computational efficiency. Second, providing theoretical support for our empirical observations, we identify regularity assumptions under which the proposed algorithm leads to asymptotically optimal inference. We also provide illustrative examples focusing on normal linear and logistic regressions where parts of our D&C algorithm are analytically tractable.

引用

页数：14

共 47 条

[1] BARYCENTERS IN THE WASSERSTEIN SPACE
Agueh, Martial
Carlier, Guillaume
[J]. SIAM JOURNAL ON MATHEMATICAL ANALYSIS, 2011, 43 (02) : 904 - 924
[2] Ahn S., 2012, P 29 INT COFERENCE I, P1591
[3] Noisy Monte Carlo: convergence of Markov chains with approximate transition kernels
Alquier, P.
Friel, N.
Everitt, R.
Boland, A.
[J]. STATISTICS AND COMPUTING, 2016, 26 (1-2) : 29 - 47
[4] A fixed-point approach to barycenters in Wasserstein space
Alvarez-Esteban, Pedro C.
del Barrio, E.
Cuesta-Albertos, J. A.
Matran, C.
[J]. JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 2016, 441 (02) : 744 - 762
[5] Bardenet R., 2015, On Markov chain Monte Carlo methods for tall data
[6] Campbell T., 2018, ARXIV PREPRINT ARXIV
[7] Stan: A Probabilistic Programming Language
Carpenter, Bob
Gelman, Andrew
Hoffman, Matthew D.
Lee, Daniel
Goodrich, Ben
Betancourt, Michael
Brubaker, Marcus A.
Guo, Jiqiang
Li, Peter
Riddell, Allen
[J]. JOURNAL OF STATISTICAL SOFTWARE, 2017, 76 (01): : 1 - 29
[8] The Polya-Gamma Gibbs sampler for Bayesian logistic regression is uniformly ergodic
Choi, Hee Min
Hobert, James P.
[J]. ELECTRONIC JOURNAL OF STATISTICS, 2013, 7 : 2054 - 2064
[9] Claici S., 2017, ADV NEUR IN
[10] Conlon Erin, 2021, CRAN

← 1 2 3 4 5 →