Stochastic optimization with momentum: Convergence, fluctuations, and traps avoidance

被引：4

作者：

Barakat, Anas ^{[1
]}

Bianchi, Pascal ^{[1
]}

Hachem, Walid ^{[2
]}

Schechtman, Sholom ^{[2
]}

机构：

[1] IP Paris, Telecom Paris, LTCI, Paris, France

[2] Univ Gustave Eiffel, ESIEE Paris, CNRS, LIGM, F-77454 Marne La Vallee, France

来源：

ELECTRONIC JOURNAL OF STATISTICS | 2021年 / 15卷 / 02期

关键词：

Stochastic approximation; dynamical systems; adaptive gradient methods with momentum; Nesterov accelerated gradient; ADAM; avoidance of traps; APPROXIMATION; BEHAVIOR; SYSTEM; RATES;

D O I：

10.1214/21-EJS1880

中图分类号：

O21 [概率论与数理统计]; C8 [统计学];

学科分类号：

020208 ; 070103 ; 0714 ;

摘要：

In this paper, a general stochastic optimization procedure is studied, unifying several variants of the stochastic gradient descent such as, among others, the stochastic heavy ball method, the Stochastic Nesterov Accelerated Gradient algorithm (S-NAG), and the widely used Adam algorithm. The algorithm is seen as a noisy Euler discretization of a non-autonomous ordinary differential equation, recently introduced by Belotto da Silva and Gazeau, which is analyzed in depth. Assuming that the objective function is non-convex and differentiable, the stability and the almost sure convergence of the iterates to the set of critical points are established. A noteworthy special case is the convergence proof of S-NAG in a non-convex setting. Under some assumptions, the convergence rate is provided under the form of a Central Limit Theorem. Finally, the non-convergence of the algorithm to undesired critical points, such as local maxima or saddle points, is established. Here, the main ingredient is a new avoidance of traps result for non-autonomous settings, which is of independent interest.

引用

页码：3892 / 3947

页数：56

共 46 条

[1]

Alacaoglu A., 2020, CONVERGENCE ADAPTIVE

[2]

Alacaoglu A, 2020, PR MACH LEARN RES, V119

[3] On the minimizing property of a second order dissipative system in Hilbert spaces [J].