Choose Your Path Wisely: Gradient Descent in a Bregman Distance Framework

被引:9
作者
Benning, Martin [1 ]
Betcke, Marta M. [2 ]
Ehrhardt, Matthias J. [3 ]
Schonlieb, Carola-Bibiane [4 ]
机构
[1] Queen Mary Univ London, Sch Math Sci, London E1 4NS, England
[2] UCL, Dept Comp Sci, London WC1V 6LJ, England
[3] Univ Bath, Inst Math Innovat, Bath BA2 7JU, Avon, England
[4] Univ Cambridge, Dept Appl Math & Theoret Phys, Cambridge CB3 0WA, England
基金
英国工程与自然科学研究理事会; 欧盟地平线“2020”;
关键词
nonconvex optimization; nonsmooth optimization; gradient descent; Bregman iteration; linearized Bregman iteration; parallel MRI; blind deconvolution; deep learning; ALTERNATING LINEARIZED MINIMIZATION; PROXIMAL ALGORITHM; NONCONVEX; OPTIMIZATION; CONVERGENCE; RECONSTRUCTION; REGULARIZATION; ITERATIONS;
D O I
10.1137/20M1357500
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose an extension of a special form of gradient descent|in the literature known as linearized Bregman iteration-to a larger class of nonconvex functions. We replace the classical (squared) two norm metric in the gradient descent setting with a generalized Bregman distance, based on a proper, convex, and lower semicontinuous function. The algorithm's global convergence is proven for functions that satisfy the Kurdyka-Lojasiewicz property. Examples illustrate that features of different scale are being introduced throughout the iteration, transitioning from coarse to fine. This coarse-to-fine approach with respect to scale allows us to recover solutions of nonconvex optimization problems that are superior to those obtained with conventional gradient descent, or even projected and proximal gradient descent. The effectiveness of the linearized Bregman iteration in combination with early stopping is illustrated for the applications of parallel magnetic resonance imaging, blind deconvolution, as well as image classification with neural networks.
引用
收藏
页码:814 / 843
页数:30
相关论文
共 85 条
[1]   Schwartz Functions on Nash Manifolds [J].
Aizenbud, Avraham ;
Gourevitch, Dmitry .
INTERNATIONAL MATHEMATICS RESEARCH NOTICES, 2008, 2008
[2]  
[Anonymous], 2017, ARXIV170702278
[3]  
[Anonymous], 2012, Advances in Neural Information Processing Systems
[4]  
[Anonymous], 2016, BLIND IMAGE DECONVOL
[5]  
[Anonymous], 2016, ARXIV161003446
[6]  
[Anonymous], 2017, MI Lecture Notes
[7]  
[Anonymous], 2007, FAST DISCRETE OPTIMI
[8]   On the convergence of the proximal algorithm for nonsmooth functions involving analytic features [J].
Attouch, Hedy ;
Bolte, Jerome .
MATHEMATICAL PROGRAMMING, 2009, 116 (1-2) :5-16
[9]   Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods [J].
Attouch, Hedy ;
Bolte, Jerome ;
Svaiter, Benar Fux .
MATHEMATICAL PROGRAMMING, 2013, 137 (1-2) :91-129
[10]   Proximal Alternating Minimization and Projection Methods for Nonconvex Problems: An Approach Based on the Kurdyka-Lojasiewicz Inequality [J].
Attouch, Hedy ;
Bolte, Jerome ;
Redont, Patrick ;
Soubeyran, Antoine .
MATHEMATICS OF OPERATIONS RESEARCH, 2010, 35 (02) :438-457