Stochastic optimization with momentum: Convergence, fluctuations, and traps avoidance

被引:4
作者
Barakat, Anas [1 ]
Bianchi, Pascal [1 ]
Hachem, Walid [2 ]
Schechtman, Sholom [2 ]
机构
[1] IP Paris, Telecom Paris, LTCI, Paris, France
[2] Univ Gustave Eiffel, ESIEE Paris, CNRS, LIGM, F-77454 Marne La Vallee, France
关键词
Stochastic approximation; dynamical systems; adaptive gradient methods with momentum; Nesterov accelerated gradient; ADAM; avoidance of traps; APPROXIMATION; BEHAVIOR; SYSTEM; RATES;
D O I
10.1214/21-EJS1880
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm (S-NAG), and the widely used Adam algorithm. The algorithm is seen as a noisy Euler discretization of a non-autonomous ordinary differential equation, recently introduced by Belotto da Silva and Gazeau, which is analyzed in depth. Assuming that the objective function is non-convex and differentiable, the stability and the almost sure convergence of the iterates to the set of critical points are established. A noteworthy special case is the convergence proof of S-NAG in a non-convex setting. Under some assumptions, the convergence rate is provided under the form of a Central Limit Theorem. Finally, the non-convergence of the algorithm to undesired critical points, such as local maxima or saddle points, is established. Here, the main ingredient is a new avoidance of traps result for non-autonomous settings, which is of independent interest.
引用
收藏
页码:3892 / 3947
页数:56
相关论文
共 46 条
[1]  
Alacaoglu A., 2020, CONVERGENCE ADAPTIVE
[2]  
Alacaoglu A, 2020, PR MACH LEARN RES, V119
[4]  
[Anonymous], 2018, IJCAI, DOI DOI 10.24963/IJCAI.2018/410
[5]  
[Anonymous], 2012, COURSERA NEURAL NETW
[6]  
Assran M, 2020, PR MACH LEARN RES, V119
[7]   The heavy ball with friction method, I. The continuous dynamical system: Global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system [J].
Attouch, H ;
Goudou, X ;
Redont, P .
COMMUNICATIONS IN CONTEMPORARY MATHEMATICS, 2000, 2 (01) :1-34
[8]   Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity [J].
Attouch, Hedy ;
Chbani, Zaki ;
Peypouquet, Juan ;
Redont, Patrick .
MATHEMATICAL PROGRAMMING, 2018, 168 (1-2) :123-175
[9]   OPTIMAL CONVERGENCE RATES FOR NESTEROV ACCELERATION [J].
Aujol, Jean-Francois ;
Dossal, Charles ;
Rondepierre, Aude .
SIAM JOURNAL ON OPTIMIZATION, 2019, 29 (04) :3131-3153
[10]   CONVERGENCE AND DYNAMICAL BEHAVIOR OF THE ADAM ALGORITHM FOR NONCONVEX STOCHASTIC OPTIMIZATION [J].
Barakat, Anas ;
Bianchi, Pascal .
SIAM JOURNAL ON OPTIMIZATION, 2021, 31 (01) :244-274