Smooth over-parameterized solvers for non-smooth structured optimization

被引:0
作者
Clarice Poon
Gabriel Peyré
机构
[1] University of Bath,Department of Mathematical Sciences
[2] PSL University,CNRS and DMA, Ecole Normale Supérieure
来源
Mathematical Programming | 2023年 / 201卷
关键词
Sparsity; Low-rank; Compressed sensing; Variable projection; Mirror descent; Non-convex optimization; 68Q25; 68R10; 68U05;
D O I
暂无
中图分类号
学科分类号
摘要
Non-smooth optimization is a core ingredient of many imaging or machine learning pipelines. Non-smoothness encodes structural constraints on the solutions, such as sparsity, group sparsity, low-rank and sharp edges. It is also the basis for the definition of robust loss functions and scale-free functionals such as square-root Lasso. Standard approaches to deal with non-smoothness leverage either proximal splitting or coordinate descent. These approaches are effective but usually require parameter tuning, preconditioning or some sort of support pruning. In this work, we advocate and study a different route, which operates a non-convex but smooth over-parameterization of the underlying non-smooth optimization problems. This generalizes quadratic variational forms that are at the heart of the popular Iterative Reweighted Least Squares. Our main theoretical contribution connects gradient descent on this reformulation to a mirror descent flow with a varying Hessian metric. This analysis is crucial to derive convergence bounds that are dimension-free. This explains the efficiency of the method when using small grid sizes in imaging. Our main algorithmic contribution is to apply the Variable Projection method which defines a new formulation by explicitly minimizing over part of the variables. This leads to a better conditioning of the minimized functional and improves the convergence of simple but very efficient gradient-based methods, for instance quasi-Newton solvers. We exemplify the use of this new solver for the resolution of regularized regression problems for inverse problems and supervised learning, including total variation prior and non-convex regularizers.
引用
收藏
页码:897 / 952
页数:55
相关论文
共 125 条
  • [1] Argyriou A(2008)Convex multi-task feature learning Mach. Learn. 73 243-272
  • [2] Evgeniou T(1988)Two-point step size gradient methods IMA J. Numer. Anal. 8 141-148
  • [3] Pontil M(2021)Single-exponential bounds for the smallest singular value of Vandermonde matrices in the sub-Rayleigh regime Appl. Comput. Harmon. Anal. 55 426-439
  • [4] Barzilai J(2017)A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications Math. Oper. Res. 42 330-348
  • [5] Borwein JM(2003)Mirror descent and nonlinear projected subgradient methods for convex optimization Oper. Res. Lett. 31 167-175
  • [6] Batenkov D(2009)A fast iterative shrinkage-thresholding algorithm for linear inverse problems SIAM J. Imaging Sci. 2 183-202
  • [7] Goldman G(2019)On quasi-Newton forward-backward splitting: proximal calculus and convergence SIAM J. Optim. 29 2445-2481
  • [8] Bauschke HH(2011)Square-root lasso: pivotal recovery of sparse signals via conic programming Biometrika 98 791-806
  • [9] Bolte J(1996)On the unification of line processes, outlier rejection, and robust statistics with applications in early vision Int. J. Comput. Vis. 19 57-91
  • [10] Teboulle M(1985)Problem complexity and method efficiency in optimization (as nemirovsky and db yudin) SIAM Rev. 27 264-218