Choose Your Path Wisely: Gradient Descent in a Bregman Distance Framework

被引:9
作者
Benning, Martin [1 ]
Betcke, Marta M. [2 ]
Ehrhardt, Matthias J. [3 ]
Schonlieb, Carola-Bibiane [4 ]
机构
[1] Queen Mary Univ London, Sch Math Sci, London E1 4NS, England
[2] UCL, Dept Comp Sci, London WC1V 6LJ, England
[3] Univ Bath, Inst Math Innovat, Bath BA2 7JU, Avon, England
[4] Univ Cambridge, Dept Appl Math & Theoret Phys, Cambridge CB3 0WA, England
基金
英国工程与自然科学研究理事会; 欧盟地平线“2020”;
关键词
nonconvex optimization; nonsmooth optimization; gradient descent; Bregman iteration; linearized Bregman iteration; parallel MRI; blind deconvolution; deep learning; ALTERNATING LINEARIZED MINIMIZATION; PROXIMAL ALGORITHM; NONCONVEX; OPTIMIZATION; CONVERGENCE; RECONSTRUCTION; REGULARIZATION; ITERATIONS;
D O I
10.1137/20M1357500
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an extension of a special form of gradient descent|in the literature known as linearized Bregman iteration-to a larger class of nonconvex functions. We replace the classical (squared) two norm metric in the gradient descent setting with a generalized Bregman distance, based on a proper, convex, and lower semicontinuous function. The algorithm's global convergence is proven for functions that satisfy the Kurdyka-Lojasiewicz property. Examples illustrate that features of different scale are being introduced throughout the iteration, transitioning from coarse to fine. This coarse-to-fine approach with respect to scale allows us to recover solutions of nonconvex optimization problems that are superior to those obtained with conventional gradient descent, or even projected and proximal gradient descent. The effectiveness of the linearized Bregman iteration in combination with early stopping is illustrated for the applications of parallel magnetic resonance imaging, blind deconvolution, as well as image classification with neural networks.
引用
收藏
页码:814 / 843
页数:30
相关论文
共 85 条
[11]   Iterative total variation schemes for nonlinear inverse problems [J].
Bachmayr, Markus ;
Burger, Martin .
INVERSE PROBLEMS, 2009, 25 (10)
[12]   A Descent Lemma Beyond Lipschitz Gradient Continuity: First-Order Methods Revisited and Applications [J].
Bauschke, Heinz H. ;
Bolte, Jerome ;
Teboulle, Marc .
MATHEMATICS OF OPERATIONS RESEARCH, 2017, 42 (02) :330-348
[13]  
Bauschke HH, 2011, CMS BOOKS MATH, P1, DOI 10.1007/978-1-4419-9467-7
[14]   Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces [J].
Bauschke, HH ;
Borwein, JM ;
Combettes, PL .
COMMUNICATIONS IN CONTEMPORARY MATHEMATICS, 2001, 3 (04) :615-647
[15]  
Benning M., 2016, IFIP C SYST MOD OPT, P117
[16]   Phase reconstruction from velocity-encoded MRI measurements - A survey of sparsity-promoting variational approaches [J].
Benning, Martin ;
Gladden, Lynn ;
Holland, Daniel ;
Schoenlieb, Carola-Bibiane ;
Valkonen, Tuomo .
JOURNAL OF MAGNETIC RESONANCE, 2014, 238 :26-43
[17]  
Bertsekas DP, 2012, OPTIMIZATION FOR MACHINE LEARNING, P85
[18]   GOLDSTEIN-LEVITIN-POLYAK GRADIENT PROJECTION METHOD [J].
BERTSEKAS, DP .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1976, 21 (02) :174-183
[19]   ALTERNATING PROXIMAL ALGORITHM FOR BLIND IMAGE RECOVERY [J].
Bolte, J. ;
Combettes, P. L. ;
Pesquet, J. -C. .
2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, :1673-1676
[20]  
Bolte J., 2017, ARXIV170606461