A Bregman Learning Framework for Sparse Neural Networks

被引:0
作者
Bungert, Leon [1 ]
Roith, Tim [2 ]
Tenbrinck, Daniel [2 ]
Burger, Martin [2 ]
机构
[1] Univ Bonn, Hausdorff Ctr Math, Endenicher Allee 62, D-53115 Bonn, Germany
[2] Friedrich Alexander Univ Erlangen Nurnberg, Dept Math, Cauerstr 11, D-91058 Erlangen, Germany
关键词
Bregman Iterations; Mirror Descent; Sparse Neural Networks; Spar-sity; Inverse Scale Space; Optimization; GRADIENT DESCENT; REGULARIZATION; ITERATION; SELECTION;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We propose a learning framework based on stochastic Bregman iterations, also known as mirror descent, to train sparse neural networks with an inverse scale space approach. We derive a baseline algorithm called LinBreg, an accelerated version using momentum, and AdaBreg, which is a Bregmanized generalization of the Adam algorithm. In contrast to established methods for sparse training the proposed family of algorithms constitutes a regrowth strategy for neural networks that is solely optimization-based without additional heuristics. Our Bregman learning framework starts the training with very few initial parameters, successively adding only significant ones to obtain a sparse and expressive network. The proposed approach is extremely easy and efficient, yet supported by the rich mathematical theory of inverse scale space methods. We derive a statistically profound sparse parameter initialization strategy and provide a rigorous stochastic convergence analysis of the loss decay and additional convergence proofs in the convex regime. Using only 3.4% of the parameters of ResNet-18 we achieve 90.2% test accuracy on CIFAR-10, compared to 93.6% using the dense network. Our algorithm also unveils an autoencoder architecture for a denoising task. The proposed framework also has a huge potential for integrating sparse backpropagation and resource-friendly training. Code is available at https://github.com/TimRoith/BregmanLearning.
引用
收藏
页数:43
相关论文
共 70 条
[1]   Artificial neural networks in medical diagnosis [J].
Amato, Filippo ;
Lopez, Alberto ;
Pena-Mendez, Eladia Maria ;
Vanhara, Petr ;
Hampl, Ales ;
Havel, Josef .
JOURNAL OF APPLIED BIOMEDICINE, 2013, 11 (02) :47-58
[2]  
[Anonymous], 2020, Proceeding of Machine Learning Research
[3]   Iterative total variation schemes for nonlinear inverse problems [J].
Bachmayr, Markus ;
Burger, Martin .
INVERSE PROBLEMS, 2009, 25 (10)
[4]  
Bauschke HH, 2011, CMS BOOKS MATH, P1, DOI 10.1007/978-1-4419-9467-7
[5]  
Beck A, 2017, MOS-SIAM SER OPTIMIZ, P1, DOI 10.1137/1.9781611974997
[6]   Mirror descent and nonlinear projected subgradient methods for convex optimization [J].
Beck, A ;
Teboulle, M .
OPERATIONS RESEARCH LETTERS, 2003, 31 (03) :167-175
[7]   Inexact Bregman iteration with an application to Poisson data reconstruction [J].
Benfenati, A. ;
Ruggiero, V. .
INVERSE PROBLEMS, 2013, 29 (06)
[8]   Choose Your Path Wisely: Gradient Descent in a Bregman Distance Framework [J].
Benning, Martin ;
Betcke, Marta M. ;
Ehrhardt, Matthias J. ;
Schonlieb, Carola-Bibiane .
SIAM JOURNAL ON IMAGING SCIENCES, 2021, 14 (02) :814-843
[9]   Modern regularization methods for inverse problems [J].
Benning, Martin ;
Burger, Martin .
ACTA NUMERICA, 2018, 27 :1-111
[10]   Inverse total variation flow [J].
Burger, M. ;
Frick, K. ;
Osher, S. ;
Scherzer, O. .
MULTISCALE MODELING & SIMULATION, 2007, 6 (02) :366-395