On Almost Sure Convergence Rates of Stochastic Gradient Methods

被引:0
作者
Liu, Jun [1 ]
Yuan, Ye [2 ,3 ]
机构
[1] Univ Waterloo, Dept Appl Math, Waterloo, ON, Canada
[2] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Wuhan, Peoples R China
[3] Huazhong Univ Sci & Technol, Sch Mech Sci & Engn, Wuhan, Peoples R China
来源
CONFERENCE ON LEARNING THEORY, VOL 178 | 2022年 / 178卷
基金
加拿大自然科学与工程研究理事会;
关键词
Stochastic gradient descent; stochastic heavy-ball; stochastic Nesterov's accelerated gradient; almost sure convergence rate; OPTIMIZATION; BOUNDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The vast majority of convergence rates analysis for stochastic gradient methods in the literature focus on convergence in expectation, whereas trajectory-wise almost sure convergence is clearly important to ensure that any instantiation of the stochastic algorithms would converge with probability one. Here we provide a unified almost sure convergence rates analysis for stochastic gradient descent (SGD), stochastic heavy-ball (SHB), and stochastic Nesterov's accelerated gradient (SNAG) methods. We show, for the first time, that the almost sure convergence rates obtained for these stochastic gradient methods on strongly convex functions, are arbitrarily close to their optimal convergence rates possible. For non-convex objective functions, we not only show that a weighted average of the squared gradient norms converges to zero almost surely, but also the last iterates of the algorithms. We further provide last-iterate almost sure convergence rates analysis for stochastic gradient methods on general convex smooth functions, in contrast with most existing results in the literature that only provide convergence in expectation for a weighted average of the iterates.
引用
收藏
页数:21
相关论文
共 50 条
  • [41] Convergence in High Probability of Distributed Stochastic Gradient Descent Algorithms
    Lu, Kaihong
    Wang, Hongxia
    Zhang, Huanshui
    Wang, Long
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2024, 69 (04) : 2189 - 2204
  • [42] On the Convergence of Stochastic Multi-Objective Gradient Manipulation and Beyond
    Zhou, Shiji
    Zhang, Wenpeng
    Jiang, Jiyan
    Zhong, Wenliang
    Gu, Jinjie
    Zhu, Wenwu
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [43] Fast Convergence for Stochastic and Distributed Gradient Descent in the Interpolation Limit
    Mitra, Partha P.
    2018 26TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO), 2018, : 1890 - 1894
  • [44] Learning Rates for Stochastic Gradient Descent With Nonconvex Objectives
    Lei, Yunwen
    Tang, Ke
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) : 4505 - 4511
  • [45] Understanding the Role of Momentum in Stochastic Gradient Methods
    Gitman, Igor
    Lang, Hunter
    Zhang, Pengchuan
    Xiao, Lin
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [46] Riemannian gradient methods for stochastic composition problems
    Huang, Feihu
    Gao, Shangqian
    NEURAL NETWORKS, 2022, 153 : 224 - 234
  • [47] Convergence Rates of Zeroth Order Gradient Descent for Łojasiewicz Functions
    Wang, Tianyu
    Feng, Yasong
    INFORMS JOURNAL ON COMPUTING, 2024, 36 (06) : 1611 - 1633
  • [48] Convergence rates for shallow neural networks learned by gradient descent
    Braun, Alina
    Kohler, Michael
    Langer, Sophie
    Walk, Harro
    BERNOULLI, 2024, 30 (01) : 475 - 502
  • [49] Dual stochastic natural gradient descent and convergence of interior half-space gradient approximations
    Sanchez-Lopez, Borja
    Cerquides, Jesus
    INFORMATION GEOMETRY, 2025, : 125 - 157
  • [50] REGRESSION METHODS FOR STOCHASTIC CONTROL PROBLEMS AND THEIR CONVERGENCE ANALYSIS
    Belomestny, Denis
    Kolodko, Anastasia
    Schoenmakers, John
    SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2010, 48 (05) : 3562 - 3588