On the Last Iterate Convergence of Momentum Methods

被引：0

作者：

Li, Xiaoyu ^{[1
]}

Liu, Mingrui ^{[2
]}

Orabona, Francesco ^{[3
]}

机构：

[1] Boston Univ, Div Syst Engn, Boston, MA 02215 USA

[2] George Mason Univ, Dept Comp Sci, Fairfax, VA 22030 USA

[3] Boston Univ, Elect & Comp Engn, Boston, MA 02215 USA

来源：

INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 167 | 2022年 / 167卷

基金：

美国国家科学基金会;

关键词：

Convex Optimization; Momentum methods; Stochastic Optimization;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

SGD with Momentum (SGDM) is a widely used family of algorithms for large-scale optimization of machine learning problems. Yet, when optimizing generic convex functions, no advantage is known for any SGDM algorithm over plain SGD. Moreover, even the most recent results require changes to the SGDM algorithms, like averaging of the iterates and a projection onto a bounded domain, which are rarely used in practice. In this paper, we focus on the convergence rate of the last iterate of SGDM. For the first time, we prove that for any constant momentum factor, there exists a Lipschitz and convex function for which the last iterate of SGDM suffers from a suboptimal convergence rate of Omega(ln T/root T) after T iterations. Based on this fact, we study a class of (both adaptive and non-adaptive) Follow-The-Regularized-Leader-based SGDM algorithms with increasing momentum and shrinking updates. For these algorithms, we show that the last iterate has optimal convergence O(1/root T) for unconstrained convex stochastic optimization problems without projections onto bounded domains nor knowledge of T. Further, we show a variety of results for FTRL-based SGDM when used with adaptive stepsizes. Empirical results are shown as well.

引用

页数：19

共 43 条

[41] ADDITIVE SCHWARZ METHODS FOR SEMILINEAR ELLIPTIC PROBLEMS WITH CONVEX ENERGY FUNCTIONALS: CONVERGENCE RATE INDEPENDENT OF NONLINEARITY
Park, Jongho
SIAM JOURNAL ON SCIENTIFIC COMPUTING, 2024, 46 (03) : A1373 - A1396
[42] Restart of Accelerated First-Order Methods With Linear Convergence Under a Quadratic Functional Growth Condition
Alamo, Teodoro
Krupa, Pablo
Limon, Daniel
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (01) : 612 - 619
[43] Convergence Rate of Overlapping Domain Decomposition Methods for the Rudin-Osher-Fatemi Model Based on a Dual Formulation
Chang, Huibin
Tai, Xue-Cheng
Wang, Li-Lian
Yang, Danping
SIAM JOURNAL ON IMAGING SCIENCES, 2015, 8 (01): : 564 - 591

← 1 2 3 4 5 →