Understanding the acceleration phenomenon via high-resolution differential equations

被引:85
作者
Shi, Bin [1 ]
Du, Simon S. [2 ]
Jordan, Michael, I [3 ]
Su, Weijie J. [4 ]
机构
[1] Chinese Acad Sci, Acad Math & Syst Sci, State Key Lab Sci & Engn Comp, Beijing, Peoples R China
[2] Univ Washington, Seattle, WA 98195 USA
[3] Univ Calif Berkeley, Berkeley, CA 94720 USA
[4] Univ Penn, Philadelphia, PA 19104 USA
关键词
Convex optimization; First-order method; Polyak's heavy ball method; Nesterov's accelerated gradient methods; Ordinary differential equation; Lyapunov function; Gradient minimization; CONVERGENCE; OPTIMIZATION; ALGORITHM; SYSTEM;
D O I
10.1007/s10107-021-01681-8
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs). Motivated by the fact that existing ODES do not distinguish between two fundamentally different algorithms-Nesterov's accelerated gradient method for strongly convex functions (NAG-SC) and Polyak's heavy-ball method-we study an alternative limiting process that yields high-resolution ODEs. We show that these ODES permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time. We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG-SC and Polyak's heavy-ball method, but they allow the identification of a term that we refer to as "gradient correction" that is present in NAG-SC but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods. We also use the high-resolution ODE framework to study Nesterov's accelerated gradient method for (non-strongly) convex functions, uncovering a hitherto unknown result-that NAG-C minimizes the squared gradient norm at an inverse cubic rate. Finally, by modifying the high-resolution ODE of NAG-C, we obtain a family of new optimization methods that are shown to maintain the accelerated convergence rates of NAG-C for smooth convex functions.
引用
收藏
页码:79 / 148
页数:70
相关论文
共 45 条
  • [2] A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics
    Alvarez, F
    Attouch, H
    Bolte, J
    Redont, P
    [J]. JOURNAL DE MATHEMATIQUES PURES ET APPLIQUEES, 2002, 81 (08): : 747 - 779
  • [3] Convex Optimization: Algorithms and Complexity
    不详
    [J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2015, 8 (3-4): : 232 - +
  • [4] [Anonymous], 2018, ARXIV180500521
  • [5] [Anonymous], 2017, ARXIV171011606
  • [6] Arnold V.I., 2013, MATH METHODS CLASSIC, V60
  • [7] Attouch H., 2017, ARXIV170605671
  • [8] CONVERGENCE RATES OF INERTIAL FORWARD-BACKWARD ALGORITHMS
    Attouch, Hedy
    Cabot, Alexandre
    [J]. SIAM JOURNAL ON OPTIMIZATION, 2018, 28 (01) : 849 - 874
  • [9] Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity
    Attouch, Hedy
    Chbani, Zaki
    Peypouquet, Juan
    Redont, Patrick
    [J]. MATHEMATICAL PROGRAMMING, 2018, 168 (1-2) : 123 - 175
  • [10] THE RATE OF CONVERGENCE OF NESTEROV'S ACCELERATED FORWARD-BACKWARD METHOD IS ACTUALLY FASTER THAN 1/k2
    Attouch, Hedy
    Peypouquet, Juan
    [J]. SIAM JOURNAL ON OPTIMIZATION, 2016, 26 (03) : 1824 - 1834