SGD with Momentum (SGDM) is a widely used family of algorithms for large-scale optimization of machine learning problems. Yet, when optimizing generic convex functions, no advantage is known for any SGDM algorithm over plain SGD. Moreover, even the most recent results require changes to the SGDM algorithms, like averaging of the iterates and a projection onto a bounded domain, which are rarely used in practice. In this paper, we focus on the convergence rate of the last iterate of SGDM. For the first time, we prove that for any constant momentum factor, there exists a Lipschitz and convex function for which the last iterate of SGDM suffers from a suboptimal convergence rate of Omega(ln T/root T) after T iterations. Based on this fact, we study a class of (both adaptive and non-adaptive) Follow-The-Regularized-Leader-based SGDM algorithms with increasing momentum and shrinking updates. For these algorithms, we show that the last iterate has optimal convergence O(1/root T) for unconstrained convex stochastic optimization problems without projections onto bounded domains nor knowledge of T. Further, we show a variety of results for FTRL-based SGDM when used with adaptive stepsizes. Empirical results are shown as well.
机构:
Tianjin Normal Univ, Sch Math Sci, Tianjin 300387, Peoples R ChinaTianjin Normal Univ, Sch Math Sci, Tianjin 300387, Peoples R China
Chang, Huibin
Tai, Xue-Cheng
论文数: 0引用数: 0
h-index: 0
机构:
Univ Bergen, Dept Math, N-5020 Bergen, NorwayTianjin Normal Univ, Sch Math Sci, Tianjin 300387, Peoples R China
Tai, Xue-Cheng
Wang, Li-Lian
论文数: 0引用数: 0
h-index: 0
机构:
Nanyang Technol Univ, Sch Phys & Math Sci, Div Math Sci, Singapore 637371, SingaporeTianjin Normal Univ, Sch Math Sci, Tianjin 300387, Peoples R China
Wang, Li-Lian
Yang, Danping
论文数: 0引用数: 0
h-index: 0
机构:
E China Normal Univ, Dept Math, Shanghai 200241, Peoples R China
E China Normal Univ, Shanghai Key Lab Pure Math & Math Practice, Shanghai 200241, Peoples R ChinaTianjin Normal Univ, Sch Math Sci, Tianjin 300387, Peoples R China
机构:
Tianjin Normal Univ, Sch Math Sci, Tianjin 300387, Peoples R ChinaTianjin Normal Univ, Sch Math Sci, Tianjin 300387, Peoples R China
Chang, Huibin
Tai, Xue-Cheng
论文数: 0引用数: 0
h-index: 0
机构:
Univ Bergen, Dept Math, N-5020 Bergen, NorwayTianjin Normal Univ, Sch Math Sci, Tianjin 300387, Peoples R China
Tai, Xue-Cheng
Wang, Li-Lian
论文数: 0引用数: 0
h-index: 0
机构:
Nanyang Technol Univ, Sch Phys & Math Sci, Div Math Sci, Singapore 637371, SingaporeTianjin Normal Univ, Sch Math Sci, Tianjin 300387, Peoples R China
Wang, Li-Lian
Yang, Danping
论文数: 0引用数: 0
h-index: 0
机构:
E China Normal Univ, Dept Math, Shanghai 200241, Peoples R China
E China Normal Univ, Shanghai Key Lab Pure Math & Math Practice, Shanghai 200241, Peoples R ChinaTianjin Normal Univ, Sch Math Sci, Tianjin 300387, Peoples R China