Asymptotic Study of Stochastic Adaptive Algorithms in Non-convex Landscape

被引：0

作者：

Gadat, Sebastien ^{[1
]}

Gavra, Ioana ^{[2
]}

机构：

[1] Univ Toulouse 1 Capitole, Toulouse Sch Econ, Inst Univ France, Esplanade Uni, F-31080 Toulouse, France

[2] Univ Rennes, IRMAR, 2 Pl Recteur Henri Moal, Rennes, France

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2022年 / 23卷

关键词：

Stochastic optimization; Stochastic adaptive algorithm; Convergence of ran-dom variables; LONG-TIME BEHAVIOR; DYNAMICAL-SYSTEM; APPROXIMATION; OPTIMIZATION; DESCENT;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper studies some asymptotic properties of adaptive algorithms widely used in opti-mization and machine learning, and among them Adagrad and Rmsprop, which are involved in most of the blackbox deep learning algorithms. Our setup is the non-convex landscape optimization point of view, we consider a one time scale parametrization and the situation where these algorithms may or may not be used with mini-batches. We adopt the point of view of stochastic algorithms and establish the almost sure convergence of these methods when using a decreasing step-size towards the set of critical points of the target function. With a mild extra assumption on the noise, we also obtain the convergence towards the set of minimizers of the function. Along our study, we also obtain a "convergence rate" of the methods, in the vein of the works of Ghadimi and Lan (2013).

引用

页数：54

共 56 条

[1] A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics [J].

Alvarez, F ;

Attouch, H ;

Bolte, J ;

Redont, P .

JOURNAL DE MATHEMATIQUES PURES ET APPLIQUEES, 2002, 81 (08) :747-779

[2]

[Anonymous], 2012, P ADV NEUR INF PROC

[3] The heavy ball with friction method, I. The continuous dynamical system: Global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system [J].

Attouch, H ;

Goudou, X ;

Redont, P .

COMMUNICATIONS IN CONTEMPORARY MATHEMATICS, 2000, 2 (01) :1-34

[4]

Attouch H, 2019, ESAIM CONTR OPTIM CA, V25, DOI 10.1051/cocv/2017083

[5]

Bach F, 2014, J MACH LEARN RES, V15, P595

[6]

Bach Francis, 2011, Advances in Neural Information Processing Systems, P451

[7]

Barakat A., 2020, PREPRINT

[8] A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications [J].

Bauschke, Heinz H. ;

Bolte, Jerome ;

Teboulle, Marc .

MATHEMATICS OF OPERATIONS RESEARCH, 2017, 42 (02) :330-348

[9]

Benaim M., 1996, Journal of Dynamics and Differential Equations, V8, P141, DOI 10.1007/BF02218617

[10] Dynamics of Morse-Smale urn processes [J].

Benaim, M ;

Hirsch, MW .

ERGODIC THEORY AND DYNAMICAL SYSTEMS, 1995, 15 :1005-1030

← 1 2 3 4 5 6 →