Fast and Correct Gradient-Based Optimisation for Probabilistic Programming via Smoothing

被引:2
作者
Khajwal, Basim [1 ]
Ong, C-H Luke [1 ,2 ]
Wagner, Dominik [1 ]
机构
[1] Univ Oxford, Oxford, England
[2] Nanyang Technol Univ, Singapore, Singapore
来源
PROGRAMMING LANGUAGES AND SYSTEMS, ESOP 2023 | 2023年 / 13990卷
关键词
probabilistic programming; variational inference; reparameterisation gradient; value semantics; type systems;
D O I
10.1007/978-3-031-30044-8_18
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
We study the foundations of variational inference, which frames posterior inference as an optimisation problem, for probabilistic programming. The dominant approach for optimisation in practice is stochastic gradient descent. In particular, a variant using the so-called reparameterisation gradient estimator exhibits fast convergence in a traditional statistics setting. Unfortunately, discontinuities, which are readily expressible in programming languages, can compromise the correctness of this approach. We consider a simple (higher-order, probabilistic) programming language with conditionals, and we endow our language with both a measurable and a smoothed (approximate) value semantics. We present type systems which establish technical pre-conditions. Thus we can prove stochastic gradient descent with the reparameterisation gradient estimator to be correct when applied to the smoothed problem. Besides, we can solve the original problem up to any error tolerance by choosing an accuracy coefficient suitably. Empirically we demonstrate that our approach has a similar convergence as a key competitor, but is simpler, faster, and attains orders of magnitude reduction in work-normalised variance.
引用
收藏
页码:479 / 506
页数:28
相关论文
共 40 条
[1]  
Aumann Robert J., 1961, Illinois J. Math., V5
[2]  
Bertsekas D., 2015, CONVEX OPTIMIZATION
[3]   Gradient convergence in gradient methods with errors [J].
Bertsekas, DP ;
Tsitsiklis, JN .
SIAM JOURNAL ON OPTIMIZATION, 2000, 10 (03) :627-642
[4]  
Bingham E, 2019, J MACH LEARN RES, V20
[5]  
Bishop C M., 2006, Pattern recognition and machine learning, Vvol 4
[6]   Variational Inference: A Review for Statisticians [J].
Blei, David M. ;
Kucukelbir, Alp ;
McAuliffe, Jon D. .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2017, 112 (518) :859-877
[7]  
Borgström J, 2016, ACM SIGPLAN NOTICES, V51, P33, DOI [10.1145/3022670.2951942, 10.1145/2951913.2951942]
[8]  
Botev Z, 2017, Wiley StatsRef: statistics reference online, P1
[9]   Gen: A General-Purpose Probabilistic Programming System with Programmable Inference [J].
Cusumano-Towner, Marco F. ;
Saad, Feras A. ;
Lew, Alexander K. ;
Mansinghka, Vikash K. .
PROCEEDINGS OF THE 40TH ACM SIGPLAN CONFERENCE ON PROGRAMMING LANGUAGE DESIGN AND IMPLEMENTATION (PLDI '19), 2019, :221-236
[10]   Semantics of Higher-Order Probabilistic Programs with Conditioning [J].
Dahlqvist, Fredrik ;
Kozen, Dexter .
PROCEEDINGS OF THE ACM ON PROGRAMMING LANGUAGES-PACMPL, 2020, 4 (04)