Pathfinder: Parallel quasi-Newton variational inference

被引:0
作者
Zhang, Lu [1 ]
Carpenter, Bob [2 ]
Gelman, Andrew [3 ]
Vehtari, Aki [4 ]
机构
[1] Univ Southern Calif, Div Biostat, Dept Populat & Publ Hlth Sci, Los Angeles, CA 90032 USA
[2] Flatiron Inst, Ctr Computat Math, 162 5th Ave, New York, NY 10010 USA
[3] Columbia Univ, Dept Stat & Polit Sci, New York, NY 10027 USA
[4] Aalto Univ, Dept Comp Sci, Aalto 00076, Finland
基金
美国国家科学基金会; 美国国家卫生研究院;
关键词
Variational inference; Hamiltonian Monte Carlo; quasi-Newton optimization; Laplace approximation; importance resampling; ALGORITHM; MATRICES;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose Pathfinder, a variational method for approximately sampling from differentiable probabil-ity densities. Starting from a random initialization, Pathfinder locates normal approximations to the target density along a quasi-Newton optimization path, with local covariance estimated using the in-verse Hessian estimates produced by the optimizer. Pathfinder returns draws from the approximation with the lowest estimated Kullback-Leibler (KL) divergence to the target distribution. We evaluate Pathfinder on a wide range of posterior distributions, demonstrating that its approximate draws are better than those from automatic differentiation variational inference (ADVI) and comparable to those produced by short chains of dynamic Hamiltonian Monte Carlo (HMC), as measured by 1-Wasserstein distance. Compared to ADVI and short dynamic HMC runs, Pathfinder requires one to two orders of magnitude fewer log density and gradient evaluations, with greater reductions for more challenging posteriors. Importance resampling over multiple runs of Pathfinder improves the diversity of approximate draws, reducing 1-Wasserstein distance further and providing a measure of robustness to optimization failures on plateaus, saddle points, or in minor modes. The Monte Carlo KL divergence estimates are embarrassingly parallelizable in the core Pathfinder algorithm, as are multiple runs in the resampling version, further increasing Pathfinder's speed advantage with multiple cores.
引用
收藏
页数:49
相关论文
共 64 条
[1]   Particle Markov chain Monte Carlo methods [J].
Andrieu, Christophe ;
Doucet, Arnaud ;
Holenstein, Roman .
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-STATISTICAL METHODOLOGY, 2010, 72 :269-342
[2]   Patterns of Scalable Bayesian Inference [J].
Angelino, Elaine ;
Johnson, Matthew James ;
Adams, Ryan P. .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2016, 9 (2-3) :I-+
[3]  
Bales B, 2019, Arxiv, DOI arXiv:1905.11916
[4]  
Betancourt M., 2015, Current trends in Bayesian methodology with applications, V79, P2
[5]  
Betancourt M, 2018, Arxiv, DOI [arXiv:1701.02434, 10.48550/arXiv.1701.02434, DOI 10.48550/ARXIV.1701.02434]
[6]  
Bingham E, 2019, J MACH LEARN RES, V20
[7]   Variational Inference: A Review for Statisticians [J].
Blei, David M. ;
Kucukelbir, Alp ;
McAuliffe, Jon D. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) :859-877
[8]  
Bradbury J., 2018, JAX COMPOSABLE TRANS
[9]  
Broyden C. G., 1970, Journal of the Institute of Mathematics and Its Applications, V6, P222
[10]   A LIMITED MEMORY ALGORITHM FOR BOUND CONSTRAINED OPTIMIZATION [J].
BYRD, RH ;
LU, PH ;
NOCEDAL, J ;
ZHU, CY .
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 1995, 16 (05) :1190-1208