Asymptotic Study of Stochastic Adaptive Algorithms in Non-convex Landscape

被引:0
作者
Gadat, Sebastien [1 ]
Gavra, Ioana [2 ]
机构
[1] Univ Toulouse 1 Capitole, Toulouse Sch Econ, Inst Univ France, Esplanade Uni, F-31080 Toulouse, France
[2] Univ Rennes, IRMAR, 2 Pl Recteur Henri Moal, Rennes, France
关键词
Stochastic optimization; Stochastic adaptive algorithm; Convergence of ran-dom variables; LONG-TIME BEHAVIOR; DYNAMICAL-SYSTEM; APPROXIMATION; OPTIMIZATION; DESCENT;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper studies some asymptotic properties of adaptive algorithms widely used in opti-mization and machine learning, and among them Adagrad and Rmsprop, which are involved in most of the blackbox deep learning algorithms. Our setup is the non-convex landscape optimization point of view, we consider a one time scale parametrization and the situation where these algorithms may or may not be used with mini-batches. We adopt the point of view of stochastic algorithms and establish the almost sure convergence of these methods when using a decreasing step-size towards the set of critical points of the target function. With a mild extra assumption on the noise, we also obtain the convergence towards the set of minimizers of the function. Along our study, we also obtain a "convergence rate" of the methods, in the vein of the works of Ghadimi and Lan (2013).
引用
收藏
页数:54
相关论文
共 56 条
[1]   A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics [J].
Alvarez, F ;
Attouch, H ;
Bolte, J ;
Redont, P .
JOURNAL DE MATHEMATIQUES PURES ET APPLIQUEES, 2002, 81 (08) :747-779
[2]  
[Anonymous], 2012, P ADV NEUR INF PROC
[3]   The heavy ball with friction method, I. The continuous dynamical system: Global exploration of the local minima of a real-valued function by asymptotic analysis of a dissipative dynamical system [J].
Attouch, H ;
Goudou, X ;
Redont, P .
COMMUNICATIONS IN CONTEMPORARY MATHEMATICS, 2000, 2 (01) :1-34
[4]  
Attouch H, 2019, ESAIM CONTR OPTIM CA, V25, DOI 10.1051/cocv/2017083
[5]  
Bach F, 2014, J MACH LEARN RES, V15, P595
[6]  
Bach Francis, 2011, Advances in Neural Information Processing Systems, P451
[7]  
Barakat A., 2020, PREPRINT
[8]   A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications [J].
Bauschke, Heinz H. ;
Bolte, Jerome ;
Teboulle, Marc .
MATHEMATICS OF OPERATIONS RESEARCH, 2017, 42 (02) :330-348
[9]  
Benaim M., 1996, Journal of Dynamics and Differential Equations, V8, P141, DOI 10.1007/BF02218617
[10]   Dynamics of Morse-Smale urn processes [J].
Benaim, M ;
Hirsch, MW .
ERGODIC THEORY AND DYNAMICAL SYSTEMS, 1995, 15 :1005-1030