Understanding the acceleration phenomenon via high-resolution differential equations

被引：96

作者：

Shi, Bin ^{[1
]}

Du, Simon S. ^{[2
]}

Jordan, Michael, I ^{[3
]}

Su, Weijie J. ^{[4
]}

机构：

[1] Chinese Acad Sci, Acad Math & Syst Sci, State Key Lab Sci & Engn Comp, Beijing, Peoples R China

[2] Univ Washington, Seattle, WA 98195 USA

[3] Univ Calif Berkeley, Berkeley, CA 94720 USA

[4] Univ Penn, Philadelphia, PA 19104 USA

来源：

MATHEMATICAL PROGRAMMING | 2022年 / 195卷 / 1-2期

关键词：

Convex optimization; First-order method; Polyak's heavy ball method; Nesterov's accelerated gradient methods; Ordinary differential equation; Lyapunov function; Gradient minimization; CONVERGENCE; OPTIMIZATION; ALGORITHM; SYSTEM;

D O I：

10.1007/s10107-021-01681-8

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Gradient-based optimization algorithms can be studied from the perspective of limiting ordinary differential equations (ODEs). Motivated by the fact that existing ODES do not distinguish between two fundamentally different algorithms-Nesterov's accelerated gradient method for strongly convex functions (NAG-SC) and Polyak's heavy-ball method-we study an alternative limiting process that yields high-resolution ODEs. We show that these ODES permit a general Lyapunov function framework for the analysis of convergence in both continuous and discrete time. We also show that these ODEs are more accurate surrogates for the underlying algorithms; in particular, they not only distinguish between NAG-SC and Polyak's heavy-ball method, but they allow the identification of a term that we refer to as "gradient correction" that is present in NAG-SC but not in the heavy-ball method and is responsible for the qualitative difference in convergence of the two methods. We also use the high-resolution ODE framework to study Nesterov's accelerated gradient method for (non-strongly) convex functions, uncovering a hitherto unknown result-that NAG-C minimizes the squared gradient norm at an inverse cubic rate. Finally, by modifying the high-resolution ODE of NAG-C, we obtain a family of new optimization methods that are shown to maintain the accelerated convergence rates of NAG-C for smooth convex functions.

引用

页码：79 / 148

页数：70

共 45 条

[1] On the minimizing property of a second order dissipative system in Hilbert spaces [J].

Alvarez, F .

SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2000, 38 (04) :1102-1119

[2] A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics [J].

Alvarez, F ;

Attouch, H ;

Bolte, J ;

Redont, P .

JOURNAL DE MATHEMATIQUES PURES ET APPLIQUEES, 2002, 81 (08) :747-779

[3] Convex Optimization: Algorithms and Complexity [J].

不详 .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2015, 8 (3-4) :232-+

[4]

[Anonymous], 2012, Optima

[5]

[Anonymous], 2015, ARXIV PREPRINT ARXIV

[6]

[Anonymous], 2018, ARXIV180500521

[7]

Arnold V.I., 1978, Mathematical Methods of Classical Mechanics

[8]

Attouch H., 2017, arXiv preprint arXiv:1706.05671. ff

[9] CONVERGENCE RATES OF INERTIAL FORWARD-BACKWARD ALGORITHMS [J].

Attouch, Hedy ;

Cabot, Alexandre .

SIAM JOURNAL ON OPTIMIZATION, 2018, 28 (01) :849-874

[10] Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity [J].

Attouch, Hedy ;

Chbani, Zaki ;

Peypouquet, Juan ;

Redont, Patrick .

MATHEMATICAL PROGRAMMING, 2018, 168 (1-2) :123-175

← 1 2 3 4 5 →