Shallow Univariate ReLU Networks as Splines: Initialization, Loss Surface, Hessian, and Gradient Flow Dynamics

被引：1

作者：

Sahs, Justin ^{[1
]}

Pyle, Ryan ^{[1
]}

Damaraju, Aneel ^{[2
]}

Caro, Josue Ortega ^{[1
]}

Tavaslioglu, Onur ^{[3
]}

Lu, Andy ^{[2
]}

Anselmi, Fabio ^{[1
]}

Patel, Ankit B. ^{[1
,2
]}

机构：

[1] Baylor Coll Med, Dept Neurosci, Houston, TX 77030 USA

[2] Rice Univ, Dept Elect Engn, Houston, TX 77005 USA

[3] Rice Univ, Dept Computat & Appl Math, Houston, TX USA

来源：

FRONTIERS IN ARTIFICIAL INTELLIGENCE | 2022年 / 5卷

关键词：

neural networks; symmetry; implicit bias; splines; learning dynamics; KNOTS;

D O I：

10.3389/frai.2022.889981

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Understanding the learning dynamics and inductive bias of neural networks (NNs) is hindered by the opacity of the relationship between NN parameters and the function represented. Partially, this is due to symmetries inherent within the NN parameterization, allowing multiple different parameter settings to result in an identical output function, resulting in both an unclear relationship and redundant degrees of freedom. The NN parameterization is invariant under two symmetries: permutation of the neurons and a continuous family of transformations of the scale of weight and bias parameters. We propose taking a quotient with respect to the second symmetry group and reparametrizing ReLU NNs as continuous piecewise linear splines. Using this spline lens, we study learning dynamics in shallow univariate ReLU NNs, finding unexpected insights and explanations for several perplexing phenomena. We develop a surprisingly simple and transparent view of the structure of the loss surface, including its critical and fixed points, Hessian, and Hessian spectrum. We also show that standard weight initializations yield very flat initial functions, and that this flatness, together with overparametrization and the initial weight scale, is responsible for the strength and type of implicit regularization, consistent with previous work. Our implicit regularization results are complementary to recent work, showing that initialization scale critically controls implicit regularization via a kernel-based argument. Overall, removing the weight scale symmetry enables us to prove these results more simply and enables us to prove new results and gain new insights while offering a far more transparent and intuitive picture. Looking forward, our quotiented spline-based approach will extend naturally to the multivariate and deep settings, and alongside the kernel-based view, we believe it will play a foundational role in efforts to understand neural networks. Videos of learning dynamics using a spline-based visualization are available at http://shorturl.at/tFWZ2.

引用

页数：16

共 52 条

[1] Acharya J, 2016, Arxiv, DOI arXiv:1607.03990
[2] High-dimensional dynamics of generalization error in neural networks
Advani, Madhu S.
Saxe, Andrew M.
Sompolinsky, Haim
[J]. NEURAL NETWORKS, 2020, 132 : 428 - 446
[3] Ahlberg J.H., 1967, Mathematics in Science and Engineering, P1
[4] Arora S, 2019, Arxiv, DOI arXiv:1901.08584
[5] Arora S, 2019, Arxiv, DOI [arXiv:1904.11955, 10.48550/arXiv.1904.11955, DOI 10.48550/ARXIV.1904.11955]
[6] Badrinarayanan V, 2015, Arxiv, DOI arXiv:1511.01754
[7] Estimating and testing linear models with multiple structural changes
Bai, JS
Perron, P
[J]. ECONOMETRICA, 1998, 66 (01) : 47 - 78
[8] Balestriero R, 2018, PR MACH LEARN RES, V80
[9] Symmetry-aware reservoir computing
Barbosa, Wendson A. S.
Griffith, Aaron
Rowlands, Graham E.
Govia, Luke C. G.
Ribeill, Guilhem J.
Nguyen, Minh-Hai
Ohki, Thomas A.
Gauthier, Daniel J.
[J]. PHYSICAL REVIEW E, 2021, 104 (04)
[10] Emergence of Lie Symmetries in Functional Architectures Learned by CNNs
Bertoni, Federico
Montobbio, Noemi
Sarti, Alessandro
Citti, Giovanna
[J]. FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2021, 15

← 1 2 3 4 5 6 →