First-Order Methods for Convex Optimization

被引:15
作者
Dvurechensky, Pavel [1 ,2 ,3 ]
Shtern, Shimrit [4 ]
Staudigl, Mathias [5 ,6 ]
机构
[1] Weierstrass Inst Appl Anal & Stochast, Mohrenstr 39, D-10117 Berlin, Germany
[2] Inst Informat Transmiss Problems RAS, Bolshoy Karetny Per 19,Build 1, Moscow 127051, Russia
[3] Moscow Inst Phys & Technol, 9 Inst Skiy Per, Dolgoprudnyi 141701, Moscow Region, Russia
[4] Technion Israel Inst Technol, Fac Ind Engn & Management, Haifa, Israel
[5] Maastricht Univ, Dept Data Sci & Knowledge Engn DKE, Paul Henri Spaaklaan 1, NL-6229 EN Maastricht, Netherlands
[6] Maastricht Univ, Math Ctr Maastricht MCM, Paul Henri Spaaklaan 1, NL-6229 EN Maastricht, Netherlands
关键词
Convex Optimization; Composite Optimization; First-Order Methods; Numerical Algorithms; Convergence Rate; Proximal Mapping; Proximity Operator; Bregman Divergence; STOCHASTIC COMPOSITE OPTIMIZATION; PROJECTED SUBGRADIENT METHODS; INTERMEDIATE GRADIENT-METHOD; COORDINATE DESCENT METHODS; VARIATIONAL-INEQUALITIES; MIRROR DESCENT; FRANK-WOLFE; APPROXIMATION ALGORITHMS; THRESHOLDING ALGORITHM; MINIMIZATION ALGORITHM;
D O I
10.1016/j.ejco.2021.100015
中图分类号
C93 [管理学]; O22 [运筹学];
学科分类号
070105 ; 12 ; 1201 ; 1202 ; 120202 ;
摘要
First-order methods for solving convex optimization problems have been at the forefront of mathematical optimization in the last 20 years. The rapid development of this important class of algorithms is motivated by the success stories reported in various applications, including most importantly machine learning, signal processing, imaging and control theory. First-order methods have the potential to provide low accuracy solutions at low computational complexity which makes them an attractive set of tools in large-scale optimization problems. In this survey, we cover a number of key developments in gradient-based optimization methods. This includes non-Euclidean extensions of the classical proximal gradient method, and its accelerated versions. Additionally we survey recent developments within the class of projection-free methods, and proximal versions of primal dual schemes. We give complete proofs for various key results, and highlight the unifying aspects of several optimization algorithms.
引用
收藏
页数:27
相关论文
共 237 条
[1]   Linear convergence of a modified Frank-Wolfe algorithm for computing minimum-volume enclosing ellipsoids [J].
Ahipasaoglu, S. Damla ;
Sun, Peng ;
Todd, Michael J. .
OPTIMIZATION METHODS & SOFTWARE, 2008, 23 (01) :5-19
[2]  
Alacaoglu A, 2017, ADV NEUR IN, V30
[3]  
Allen-Zhu Z., 2016, Advances in Neural Information Processing Systems 29 (NeurIPS), V29, P1614
[4]   Katyusha: The First Direct Acceleration of Stochastic Gradient Methods [J].
Allen-Zhu, Zeyuan .
STOC'17: PROCEEDINGS OF THE 49TH ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING, 2017, :1200-1205
[5]   Hessian Riemannian gradient flows in convex programming [J].
Alvarez, F ;
Bolte, J ;
Brahic, O .
SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2004, 43 (02) :477-501
[6]  
Andersen E. D., 2000, Applied Optimization, V33, P197, DOI [DOI 10.1007/978-1-4757-3216-0, DOI 10.1007/978-1-4757-3216-0_8]
[7]   Dual approaches to the minimization of strongly convex functionals with a simple structure under affine constraints [J].
Anikin, A. S. ;
Gasnikov, A. V. ;
Dvurechensky, P. E. ;
Tyurin, A. I. ;
Chernov, A. V. .
COMPUTATIONAL MATHEMATICS AND MATHEMATICAL PHYSICS, 2017, 57 (08) :1262-1276
[8]   Convex Optimization: Algorithms and Complexity [J].
不详 .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2015, 8 (3-4) :232-+
[9]  
[Anonymous], 1998, VARIATIONAL ANAL
[10]  
[Anonymous], 1996, P IEEE