Fast and Correct Gradient-Based Optimisation for Probabilistic Programming via Smoothing

被引:2
作者
Khajwal, Basim [1 ]
Ong, C-H Luke [1 ,2 ]
Wagner, Dominik [1 ]
机构
[1] Univ Oxford, Oxford, England
[2] Nanyang Technol Univ, Singapore, Singapore
来源
PROGRAMMING LANGUAGES AND SYSTEMS, ESOP 2023 | 2023年 / 13990卷
关键词
probabilistic programming; variational inference; reparameterisation gradient; value semantics; type systems;
D O I
10.1007/978-3-031-30044-8_18
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We study the foundations of variational inference, which frames posterior inference as an optimisation problem, for probabilistic programming. The dominant approach for optimisation in practice is stochastic gradient descent. In particular, a variant using the so-called reparameterisation gradient estimator exhibits fast convergence in a traditional statistics setting. Unfortunately, discontinuities, which are readily expressible in programming languages, can compromise the correctness of this approach. We consider a simple (higher-order, probabilistic) programming language with conditionals, and we endow our language with both a measurable and a smoothed (approximate) value semantics. We present type systems which establish technical pre-conditions. Thus we can prove stochastic gradient descent with the reparameterisation gradient estimator to be correct when applied to the smoothed problem. Besides, we can solve the original problem up to any error tolerance by choosing an accuracy coefficient suitably. Empirically we demonstrate that our approach has a similar convergence as a key competitor, but is simpler, faster, and attains orders of magnitude reduction in work-normalised variance.
引用
收藏
页码:479 / 506
页数:28
相关论文
共 40 条
[31]  
Ranganath R, 2014, JMLR WORKSH CONF PRO, V33, P814
[32]  
Rezende DJ, 2014, PR MACH LEARN RES, V32, P1278
[33]  
Stacey A, 2011, THEOR APPL CATEG, V25, P64
[34]   Commutative Semantics for Probabilistic Programming [J].
Staton, Sam .
PROGRAMMING LANGUAGES AND SYSTEMS (ESOP 2017): 26TH EUROPEAN SYMPOSIUM ON PROGRAMMING, 2017, 10201 :855-879
[35]   Semantics for probabilistic programming: higher-order functions, continuous distributions, and soft constraints [J].
Staton, Sam ;
Yang, Hongseok ;
Wood, Frank ;
Heunen, Chris ;
Kammar, Ohad .
PROCEEDINGS OF THE 31ST ANNUAL ACM-IEEE SYMPOSIUM ON LOGIC IN COMPUTER SCIENCE (LICS 2016), 2016, :525-534
[36]  
Titsias MK, 2014, PR MACH LEARN RES, V32, P1971
[37]   A Domain Theory for Statistical Probabilistic Program [J].
Vakar, Matthijs ;
Kammar, Ohad ;
Staton, Sam .
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2019, 3 (POPL)
[38]  
Wingate D, 2013, Arxiv, DOI arXiv:1301.1299
[39]   DISCONTINUOUS OPTIMIZATION BY SMOOTHING [J].
ZANG, I .
MATHEMATICS OF OPERATIONS RESEARCH, 1981, 6 (01) :140-152
[40]   Advances in Variational Inference [J].
Zhang, Cheng ;
Butepage, Judith ;
Kjellstrom, Hedvig ;
Mandt, Stephan .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2019, 41 (08) :2008-2026