Momentum-Based Variance-Reduced Proximal Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization

被引:6
|
作者
Xu, Yangyang [1 ]
Xu, Yibo [2 ]
机构
[1] Rensselaer Polytech Inst, Dept Math Sci, Troy, NY 12180 USA
[2] Clemson Univ, Sch Math & Stat Sci, Clemson, SC 29634 USA
关键词
Stochastic gradient method; Variance reduction; Momentum; Small-batch training; CONVEX;
D O I
10.1007/s10957-022-02132-w
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
Stochastic gradient methods (SGMs) have been extensively used for solving stochastic problems or large-scale machine learning problems. Recent works employ various techniques to improve the convergence rate of SGMs for both convex and nonconvex cases. Most of them require a large number of samples in some or all iterations of the improved SGMs. In this paper, we propose a new SGM, named PStorm, for solving nonconvex nonsmooth stochastic problems. With a momentum-based variance reduction technique, PStorm can achieve the optimal complexity result O(epsilon(-3)) to produce a stochastic epsilon-stationary solution, if a mean-squared smoothness condition holds. Different from existing optimal methods, PStorm can achieve the O(epsilon(-3)) result by using only one or O(1) samples in every update. With this property, PStorm can be applied to online learning problems that favor real-time decisions based on one or O(1) new observations. In addition, for large-scale machine learning problems, PStorm can generalize better by small-batch training than other optimal methods that require large-batch training and the vanilla SGM, as we demonstrate on training a sparse fully-connected neural network and a sparse convolutional neural network.
引用
收藏
页码:266 / 297
页数:32
相关论文
共 50 条
  • [1] Momentum-Based Variance-Reduced Proximal Stochastic Gradient Method for Composite Nonconvex Stochastic Optimization
    Yangyang Xu
    Yibo Xu
    Journal of Optimization Theory and Applications, 2023, 196 : 266 - 297
  • [2] Momentum-based variance-reduced stochastic Bregman proximal gradient methods for nonconvex nonsmooth optimization
    Liao, Shichen
    Liu, Yan
    Han, Congying
    Guo, Tiande
    EXPERT SYSTEMS WITH APPLICATIONS, 2025, 266
  • [3] Stochastic Variance-Reduced Cubic Regularization for Nonconvex Optimization
    Wang, Zhe
    Zhou, Yi
    Liang, Yingbin
    Lan, Guanghui
    22ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 89, 2019, 89
  • [4] Stochastic variance-reduced prox-linear algorithms for nonconvex composite optimization
    Junyu Zhang
    Lin Xiao
    Mathematical Programming, 2022, 195 : 649 - 691
  • [5] Stochastic variance-reduced prox-linear algorithms for nonconvex composite optimization
    Zhang, Junyu
    Xiao, Lin
    MATHEMATICAL PROGRAMMING, 2022, 195 (1-2) : 649 - 691
  • [6] A Variance-Reduced and Stabilized Proximal Stochastic Gradient Method with Support Identification Guarantees for Structured Optimization
    Dai, Yutong
    Wang, Guanyi
    Curtis, Frank E.
    Robinson, Daniel P.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [7] Stochastic Variance-Reduced Policy Gradient
    Papini, Matteo
    Binaghi, Damiano
    Canonaco, Giuseppe
    Pirotta, Matteo
    Restelli, Marcello
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 80, 2018, 80
  • [8] Estimate Sequences for Variance-Reduced Stochastic Composite Optimization
    Kulunchakov, Andrei
    Mairal, Julien
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [9] Sampling and Update Frequencies in Proximal Variance-Reduced Stochastic Gradient Methods
    Martin Morin
    Pontus Giselsson
    Journal of Optimization Theory and Applications, 2025, 205 (3)
  • [10] Nonconvex optimization with inertial proximal stochastic variance reduction gradient
    He, Lulu
    Ye, Jimin
    Jianwei, E.
    INFORMATION SCIENCES, 2023, 648