Weak approximation of transformed stochastic gradient MCMC

被引:1
作者
Yokoi, Soma [1 ,2 ]
Otsuka, Takuma [3 ]
Sato, Issei [2 ,4 ]
机构
[1] Univ Tokyo, Grad Sch Frontier Sci, Dept Complex Sci & Engn, 5-1-5 Kashiwanoha, Kashiwa, Chiba 2778561, Japan
[2] RIKEN, Chuo Ku, 1-4-1 Nihonbashi, Tokyo 1030027, Japan
[3] NTT Corp, NTT Commun Sci Labs, 2-4 Hikaridai,Seika Cho, Kyoto 6190237, Japan
[4] Univ Tokyo, Grad Sch Informat Sci & Technol, Dept Comp Sci, Bunkyo Ku, 7-3-1 Hongo, Tokyo 1130033, Japan
关键词
Stochastic gradient MCMC; Transform; Convergence analysis; Ito process; LANGEVIN;
D O I
10.1007/s10994-020-05904-5
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic gradient Langevin dynamics (SGLD) is a computationally efficient sampler for Bayesian posterior inference given a large scale dataset and a complex model. Although SGLD is designed for unbounded random variables, practical models often incorporate variables within a bounded domain, such as non-negative or a finite interval. The use of variable transformation is a typical way to handle such a bounded variable. This paper reveals that several mapping approaches commonly used in the literature produce erroneous samples from theoretical and empirical perspectives. We show that the change of random variable in discretization using an invertible Lipschitz mapping function overcomes the pitfall as well as attains the weak convergence, while the other methods are numerically unstable or cannot be justified theoretically. Experiments demonstrate its efficacy for widely-used models with bounded latent variables, including Bayesian non-negative matrix factorization and binary neural networks.
引用
收藏
页码:1903 / 1923
页数:21
相关论文
共 22 条
[1]   Large-Scale Distributed Bayesian Matrix Factorization using Stochastic Gradient MCMC [J].
Ahn, Sungjin ;
Korattikara, Anoop ;
Liu, Nathan ;
Rajan, Suju ;
Welling, Max .
KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, :9-18
[2]  
Bergstra J., 2011, Advances in Neural Information Processing Systems, V24, P2546
[3]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[4]  
BROSSE N., 2017, C LEARNING THEORY, V65, P319
[5]  
Bubeck S., 2015, Advances in Neural Information Processing Systems, P1243
[6]   Sampling from a Log-Concave Distribution with Projected Langevin Monte Carlo [J].
Bubeck, Sebastien ;
Eldan, Ronen ;
Lehec, Joseph .
DISCRETE & COMPUTATIONAL GEOMETRY, 2018, 59 (04) :757-783
[7]  
Cemgil Ali Taylan, 2009, Comput Intell Neurosci, P785152, DOI 10.1155/2009/785152
[8]  
Courbariaux M, 2015, ADV NEUR IN, V28
[9]   Efficient Bayesian Computation by Proximal Markov Chain Monte Carlo: When Langevin Meets Moreau [J].
Durmus, Alain ;
Moulines, Eric ;
Pereyra, Marcelo .
SIAM JOURNAL ON IMAGING SCIENCES, 2018, 11 (01) :473-506
[10]  
Hubara I, 2016, ADV NEUR IN, V29