On the Last Iterate Convergence of Momentum Methods

被引:0
作者
Li, Xiaoyu [1 ]
Liu, Mingrui [2 ]
Orabona, Francesco [3 ]
机构
[1] Boston Univ, Div Syst Engn, Boston, MA 02215 USA
[2] George Mason Univ, Dept Comp Sci, Fairfax, VA 22030 USA
[3] Boston Univ, Elect & Comp Engn, Boston, MA 02215 USA
来源
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167 | 2022年 / 167卷
基金
美国国家科学基金会;
关键词
Convex Optimization; Momentum methods; Stochastic Optimization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
SGD with Momentum (SGDM) is a widely used family of algorithms for large-scale optimization of machine learning problems. Yet, when optimizing generic convex functions, no advantage is known for any SGDM algorithm over plain SGD. Moreover, even the most recent results require changes to the SGDM algorithms, like averaging of the iterates and a projection onto a bounded domain, which are rarely used in practice. In this paper, we focus on the convergence rate of the last iterate of SGDM. For the first time, we prove that for any constant momentum factor, there exists a Lipschitz and convex function for which the last iterate of SGDM suffers from a suboptimal convergence rate of Omega(ln T/root T) after T iterations. Based on this fact, we study a class of (both adaptive and non-adaptive) Follow-The-Regularized-Leader-based SGDM algorithms with increasing momentum and shrinking updates. For these algorithms, we show that the last iterate has optimal convergence O(1/root T) for unconstrained convex stochastic optimization problems without projections onto bounded domains nor knowledge of T. Further, we show a variety of results for FTRL-based SGDM when used with adaptive stepsizes. Empirical results are shown as well.
引用
收藏
页数:19
相关论文
共 43 条
  • [31] FAST GLOBAL CONVERGENCE OF GRADIENT METHODS FOR SOLVING REGULARIZED M-ESTIMATION
    Agarwal, Alekh
    Negahban, Sahand
    Wainwright, Martin J.
    2012 IEEE STATISTICAL SIGNAL PROCESSING WORKSHOP (SSP), 2012, : 409 - 412
  • [32] Global and uniform convergence of subspace correction methods for some convex optimization problems
    Tai, XC
    Xu, JC
    MATHEMATICS OF COMPUTATION, 2002, 71 (237) : 105 - 124
  • [33] Linear convergence of primal-dual gradient methods and their performance in distributed optimization
    Alghunaim, Sulaiman A.
    Sayed, Ali H.
    AUTOMATICA, 2020, 117
  • [34] On the Convergence of Inexact Projection Primal First-Order Methods for Convex Minimization
    Patrascu, Andrei
    Necoara, Ion
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (10) : 3317 - 3329
  • [35] A NOTE ON THE LINEAR CONVERGENCE OF GENERALIZED PRIMAL-DUAL HYBRID GRADIENT METHODS
    Wang, Kai
    Wang, Qun
    PACIFIC JOURNAL OF OPTIMIZATION, 2023, 19 (04): : 551 - 567
  • [36] Convergence of Distributed Stochastic Variance Reduced Methods Without Sampling Extra Data
    Cen, Shicong
    Zhang, Huishuai
    Chi, Yuejie
    Chen, Wei
    Liu, Tie-Yan
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 : 3976 - 3989
  • [37] AUTOMATIC AND SIMULTANEOUS ADJUSTMENT OF LEARNING RATE AND MOMENTUM FOR STOCHASTIC GRADIENT-BASED OPTIMIZATION METHODS
    Lancewicki, Tomer
    Kopru, Selcuk
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 3127 - 3131
  • [38] Properties and practicability of convergence-guaranteed optimization methods derived from weak discrete gradients
    Ushiyama, Kansei
    Sato, Shun
    Matsuo, Takayasu
    NUMERICAL ALGORITHMS, 2024, 96 (03) : 1331 - 1362
  • [39] Convergence Analysis of Primal-Dual Based Methods for Total Variation Minimization with Finite Element Approximation
    Tian, WenYi
    Yuan, Xiaoming
    JOURNAL OF SCIENTIFIC COMPUTING, 2018, 76 (01) : 243 - 274
  • [40] ON STOCHASTIC AND DETERMINISTIC QUASI-NEWTON METHODS FOR NONSTRONGLY CONVEX OPTIMIZATION: ASYMPTOTIC CONVERGENCE AND RATE ANALYSIS
    Yousefian, Farzad
    Nedic, Angelia
    Shanbhag, Uday V.
    SIAM JOURNAL ON OPTIMIZATION, 2020, 30 (02) : 1144 - 1172