On the Last Iterate Convergence of Momentum Methods

被引:0
|
作者
Li, Xiaoyu [1 ]
Liu, Mingrui [2 ]
Orabona, Francesco [3 ]
机构
[1] Boston Univ, Div Syst Engn, Boston, MA 02215 USA
[2] George Mason Univ, Dept Comp Sci, Fairfax, VA 22030 USA
[3] Boston Univ, Elect & Comp Engn, Boston, MA 02215 USA
来源
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167 | 2022年 / 167卷
基金
美国国家科学基金会;
关键词
Convex Optimization; Momentum methods; Stochastic Optimization;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
SGD with Momentum (SGDM) is a widely used family of algorithms for large-scale optimization of machine learning problems. Yet, when optimizing generic convex functions, no advantage is known for any SGDM algorithm over plain SGD. Moreover, even the most recent results require changes to the SGDM algorithms, like averaging of the iterates and a projection onto a bounded domain, which are rarely used in practice. In this paper, we focus on the convergence rate of the last iterate of SGDM. For the first time, we prove that for any constant momentum factor, there exists a Lipschitz and convex function for which the last iterate of SGDM suffers from a suboptimal convergence rate of Omega(ln T/root T) after T iterations. Based on this fact, we study a class of (both adaptive and non-adaptive) Follow-The-Regularized-Leader-based SGDM algorithms with increasing momentum and shrinking updates. For these algorithms, we show that the last iterate has optimal convergence O(1/root T) for unconstrained convex stochastic optimization problems without projections onto bounded domains nor knowledge of T. Further, we show a variety of results for FTRL-based SGDM when used with adaptive stepsizes. Empirical results are shown as well.
引用
收藏
页数:19
相关论文
共 43 条
  • [21] On the linear convergence of additive Schwarz methods for the p-Laplacian
    Lee, Young-Ju
    Park, Jongho
    IMA JOURNAL OF NUMERICAL ANALYSIS, 2024,
  • [22] On the Convergence of Proximal Gradient Methods for Convex Simple Bilevel Optimization
    Latafat, Puya
    Themelis, Andreas
    Villa, Silvia
    Patrinos, Panagiotis
    JOURNAL OF OPTIMIZATION THEORY AND APPLICATIONS, 2025, 204 (03)
  • [23] Randomized Methods for Computing Optimal Transport Without Regularization and Their Convergence Analysis
    Xie, Yue
    Wang, Zhongjian
    Zhang, Zhiwen
    JOURNAL OF SCIENTIFIC COMPUTING, 2024, 100 (02)
  • [24] Coordinate Descent Methods for DC Minimization: Optimality Conditions and Global Convergence
    Yuan, Ganzhao
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 9, 2023, : 11034 - 11042
  • [26] Convergence Analysis of Primal Solutions in Dual First-order Methods
    Lu, Jie
    Johansson, Mikael
    2013 IEEE 52ND ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2013, : 6861 - 6867
  • [27] Adaptive NAG Methods Based on AdaGrad and Its Optimal Individual Convergence
    Long S.
    Tao W.
    Zhang Z.-D.
    Tao Q.
    Ruan Jian Xue Bao/Journal of Software, 2022, 33 (04): : 1231 - 1243
  • [28] Distributed stochastic gradient tracking methods with momentum acceleration for non-convex optimization
    Gao, Juan
    Liu, Xin-Wei
    Dai, Yu-Hong
    Huang, Yakui
    Gu, Junhua
    COMPUTATIONAL OPTIMIZATION AND APPLICATIONS, 2023, 84 (02) : 531 - 572
  • [29] Distributed stochastic gradient tracking methods with momentum acceleration for non-convex optimization
    Juan Gao
    Xin-Wei Liu
    Yu-Hong Dai
    Yakui Huang
    Junhua Gu
    Computational Optimization and Applications, 2023, 84 : 531 - 572
  • [30] Convergence of Distributed Stochastic Variance Reduced Methods Without Sampling Extra Data
    Cen, Shicong
    Zhang, Huishuai
    Chi, Yuejie
    Chen, Wei
    Liu, Tie-Yan
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2020, 68 : 3976 - 3989