Accelerating variance-reduced stochastic gradient methods

被引:0
|
作者
Derek Driggs
Matthias J. Ehrhardt
Carola-Bibiane Schönlieb
机构
[1] University of Cambridge,Department of Applied Mathematics and Theoretical Physics
[2] University of Bath,Institute for Mathematical Innovation
来源
Mathematical Programming | 2022年 / 191卷
关键词
Stochastic optimisation; Convex optimisation; Variance reduction; Accelerated gradient descent; 90C06; 90C15; 90C25; 90C30; 90C60; 68Q25;
D O I
暂无
中图分类号
学科分类号
摘要
Variance reduction is a crucial tool for improving the slow convergence of stochastic gradient descent. Only a few variance-reduced methods, however, have yet been shown to directly benefit from Nesterov’s acceleration techniques to match the convergence rates of accelerated gradient methods. Such approaches rely on “negative momentum”, a technique for further variance reduction that is generally specific to the SVRG gradient estimator. In this work, we show for the first time that negative momentum is unnecessary for acceleration and develop a universal acceleration framework that allows all popular variance-reduced methods to achieve accelerated convergence rates. The constants appearing in these rates, including their dependence on the number of functions n, scale with the mean-squared-error and bias of the gradient estimator. In a series of numerical experiments, we demonstrate that versions of SAGA, SVRG, SARAH, and SARGE using our framework significantly outperform non-accelerated versions and compare favourably with algorithms using negative momentum.
引用
收藏
页码:671 / 715
页数:44
相关论文
共 50 条
  • [1] Accelerating variance-reduced stochastic gradient methods
    Driggs, Derek
    Ehrhardt, Matthias J.
    Schonlieb, Carola-Bibiane
    MATHEMATICAL PROGRAMMING, 2022, 191 (02) : 671 - 715
  • [2] Stochastic Variance-Reduced Policy Gradient
    Papini, Matteo
    Binaghi, Damiano
    Canonaco, Giuseppe
    Pirotta, Matteo
    Restelli, Marcello
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [3] Cocoercivity, smoothness and bias in variance-reduced stochastic gradient methods
    Martin Morin
    Pontus Giselsson
    Numerical Algorithms, 2022, 91 : 749 - 772
  • [4] Cocoercivity, smoothness and bias in variance-reduced stochastic gradient methods
    Morin, Martin
    Giselsson, Pontus
    NUMERICAL ALGORITHMS, 2022, 91 (02) : 749 - 772
  • [5] Sampling and Update Frequencies in Proximal Variance-Reduced Stochastic Gradient Methods
    Martin Morin
    Pontus Giselsson
    Journal of Optimization Theory and Applications, 2025, 205 (3)
  • [6] Stochastic Variance-Reduced Cubic Regularization Methods
    Zhou, Dongruo
    Xu, Pan
    Gu, Quanquan
    JOURNAL OF MACHINE LEARNING RESEARCH, 2019, 20
  • [7] Stochastic variance-reduced cubic regularization methods
    Zhou, Dongruo
    Xu, Pan
    Gu, Quanquan
    Journal of Machine Learning Research, 2019, 20
  • [8] Variance-Reduced Stochastic Gradient Descent on Streaming Data
    Jothimurugesan, Ellango
    Tahmasbi, Ashraf
    Gibbons, Phillip B.
    Tirthapura, Srikanta
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [9] Subsampled Stochastic Variance-Reduced Gradient Langevin Dynamics
    Zou, Difan
    Xu, Pan
    Gu, Quanquan
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2018, : 508 - 518
  • [10] PLUG-AND-PLAY IMAGE RECONSTRUCTION MEETS STOCHASTIC VARIANCE-REDUCED GRADIENT METHODS
    Monardo, Vincent
    Iyer, Abhiram
    Donegan, Sean
    De Graef, Marc
    Chi, Yuejie
    2021 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2021, : 2868 - 2872