A convergence analysis for a class of practical variance-reduction stochastic gradient MCMC

被引:0
|
作者
Changyou CHEN [1 ]
Wenlin WANG [2 ]
Yizhe ZHANG [3 ]
Qinliang SU [4 ]
Lawrence CARIN [2 ]
机构
[1] Department of Computer Science and Engineering
[2] Department of Electrical and Computer Engineering, Duke University
[3] Microsoft Research
[4] School of Data and Computer Science, Sun Yat-sen University
关键词
Markov chain Monte Carlo; SG-MCMC; variance reduction; deep neural networks;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Stochastic gradient Markov chain Monte Carlo(SG-MCMC) has been developed as a flexible family of scalable Bayesian sampling algorithms. However, there has been little theoretical analysis of the impact of minibatch size to the algorithm’s convergence rate. In this paper, we prove that at the beginning of an SG-MCMC algorithm, i.e., under limited computational budget/time, a larger minibatch size leads to a faster decrease of the mean squared error bound. The reason for this is due to the prominent noise in small minibatches when calculating stochastic gradients, motivating the necessity of variance reduction in SG-MCMC for practical use. By borrowing ideas from stochastic optimization, we propose a simple and practical variance-reduction technique for SG-MCMC, that is efficient in both computation and storage.More importantly, we develop the theory to prove that our algorithm induces a faster convergence rate than standard SG-MCMC. A number of large-scale experiments, ranging from Bayesian learning of logistic regression to deep neural networks, validate the theory and demonstrate the superiority of the proposed variance-reduction SG-MCMC framework.
引用
收藏
页码:67 / 79
页数:13
相关论文
共 50 条
  • [11] Variance Reduction in Stochastic Gradient Langevin Dynamics
    Dubey, Avinava
    Reddi, Sashank J.
    Poczos, Barnabas
    Smola, Alexander J.
    Xing, Eric P.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [12] Stochastic Conjugate Gradient Algorithm With Variance Reduction
    Jin, Xiao-Bo
    Zhang, Xu-Yao
    Huang, Kaizhu
    Geng, Guang-Gang
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (05) : 1360 - 1369
  • [13] Fast Stochastic Bregman Gradient Methods: Sharp Analysis and Variance Reduction
    Dragomir, Radu-Alexandru
    Even, Mathieu
    Hendrikx, Hadrien
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [14] Improved Convergence Rate of Stochastic Gradient Langevin Dynamics with Variance Reduction and its Application to Optimization
    Kinoshita, Yuri
    Suzuki, Taiji
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [15] Distributed and asynchronous Stochastic Gradient Descent with variance reduction
    Ming, Yuewei
    Zhao, Yawei
    Wu, Chengkun
    Li, Kuan
    Yin, Jianping
    NEUROCOMPUTING, 2018, 281 : 27 - 36
  • [16] A Stochastic Composite Gradient Method with Incremental Variance Reduction
    Zhang, Junyu
    Xiao, Lin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [17] On the Theory of Variance Reduction for Stochastic Gradient Monte Carlo
    Chatterji, Niladri S.
    Flammarion, Nicolas
    Ma, Yi-An
    Bartlett, Peter L.
    Jordan, Michael I.
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [18] PROXIMAL STOCHASTIC GRADIENT METHOD WITH PROGRESSIVE VARIANCE REDUCTION
    Xiao, Lin
    Zhang, Tong
    SIAM JOURNAL ON OPTIMIZATION, 2014, 24 (04) : 2057 - 2075
  • [19] A STOCHASTIC-MODEL AND A VARIANCE-REDUCTION MONTE-CARLO METHOD FOR THE CALCULATION OF LIGHT TRANSPORT
    STARKOV, AV
    NOORMOHAMMADIAN, M
    OPPEL, UG
    APPLIED PHYSICS B-LASERS AND OPTICS, 1995, 60 (04): : 335 - 340
  • [20] Analysis of variance-reduction schemes for ensemble Monte Carlo simulation of semiconductor devices
    Pacelli, A
    Ravaioli, U
    SOLID-STATE ELECTRONICS, 1997, 41 (04) : 599 - 605