A Bregman Learning Framework for Sparse Neural Networks

被引：0

作者：

Bungert, Leon ^{[1
]}

Roith, Tim ^{[2
]}

Tenbrinck, Daniel ^{[2
]}

Burger, Martin ^{[2
]}

机构：

[1] Univ Bonn, Hausdorff Ctr Math, Endenicher Allee 62, D-53115 Bonn, Germany

[2] Friedrich Alexander Univ Erlangen Nurnberg, Dept Math, Cauerstr 11, D-91058 Erlangen, Germany

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2022年 / 23卷

关键词：

Bregman Iterations; Mirror Descent; Sparse Neural Networks; Spar-sity; Inverse Scale Space; Optimization; GRADIENT DESCENT; REGULARIZATION; ITERATION; SELECTION;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We propose a learning framework based on stochastic Bregman iterations, also known as mirror descent, to train sparse neural networks with an inverse scale space approach. We derive a baseline algorithm called LinBreg, an accelerated version using momentum, and AdaBreg, which is a Bregmanized generalization of the Adam algorithm. In contrast to established methods for sparse training the proposed family of algorithms constitutes a regrowth strategy for neural networks that is solely optimization-based without additional heuristics. Our Bregman learning framework starts the training with very few initial parameters, successively adding only significant ones to obtain a sparse and expressive network. The proposed approach is extremely easy and efficient, yet supported by the rich mathematical theory of inverse scale space methods. We derive a statistically profound sparse parameter initialization strategy and provide a rigorous stochastic convergence analysis of the loss decay and additional convergence proofs in the convex regime. Using only 3.4% of the parameters of ResNet-18 we achieve 90.2% test accuracy on CIFAR-10, compared to 93.6% using the dense network. Our algorithm also unveils an autoencoder architecture for a denoising task. The proposed framework also has a huge potential for integrating sparse backpropagation and resource-friendly training. Code is available at https://github.com/TimRoith/BregmanLearning.

引用

页数：43

共 70 条

[1] Artificial neural networks in medical diagnosis [J].

Amato, Filippo ;

Lopez, Alberto ;

Pena-Mendez, Eladia Maria ;

Vanhara, Petr ;

Hampl, Ales ;

Havel, Josef .

JOURNAL OF APPLIED BIOMEDICINE, 2013, 11 (02) :47-58

[2]

[Anonymous], 2020, Proceeding of Machine Learning Research

[3] Iterative total variation schemes for nonlinear inverse problems [J].

Bachmayr, Markus ;

Burger, Martin .

INVERSE PROBLEMS, 2009, 25 (10)

[4]

Bauschke HH, 2011, CMS BOOKS MATH, P1, DOI 10.1007/978-1-4419-9467-7

[5]

Beck A, 2017, MOS-SIAM SER OPTIMIZ, P1, DOI 10.1137/1.9781611974997

[6] Mirror descent and nonlinear projected subgradient methods for convex optimization [J].

Beck, A ;

Teboulle, M .

OPERATIONS RESEARCH LETTERS, 2003, 31 (03) :167-175

[7] Inexact Bregman iteration with an application to Poisson data reconstruction [J].

Benfenati, A. ;

Ruggiero, V. .

INVERSE PROBLEMS, 2013, 29 (06)

[8] Choose Your Path Wisely: Gradient Descent in a Bregman Distance Framework [J].

Benning, Martin ;

Betcke, Marta M. ;

Ehrhardt, Matthias J. ;

Schonlieb, Carola-Bibiane .

SIAM JOURNAL ON IMAGING SCIENCES, 2021, 14 (02) :814-843

[9] Modern regularization methods for inverse problems [J].

Benning, Martin ;

Burger, Martin .

ACTA NUMERICA, 2018, 27 :1-111

[10] Inverse total variation flow [J].

Burger, M. ;

Frick, K. ;

Osher, S. ;

Scherzer, O. .

MULTISCALE MODELING & SIMULATION, 2007, 6 (02) :366-395

← 1 2 3 4 5 6 7 →