On the training dynamics of deep networks with L2 regularization

被引：0

作者：

Lewkowycz, Aitor ^{[1
]}

Gur-Ari, Guy ^{[1
]}

机构：

[1] Google, Mountain View, CA 94043 USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 | 2020年 / 33卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the role of L-2 regularization in deep learning, and uncover simple relations between the performance of the model, the L-2 coefficient, the learning rate, and the number of training steps. These empirical relations hold when the network is overparameterized. They can be used to predict the optimal regularization parameter of a given model. In addition, based on these observations we propose a dynamical schedule for the regularization parameter that improves performance and speeds up training. We test these proposals in modern image classification settings. Finally, we show that these empirical relations can be understood theoretically in the context of infinitely wide networks. We derive the gradient flow dynamics of such networks, and compare the role of L-2 regularization in this context with that of linear models.

引用

页数：10

共 50 条

[31] Enhancing relative humidity modelling using L2 regularization updates
Abdellah Ben Yahia
Iman Kadir
Abdelaziz Abdallaoui
Abdellah El-Hmaidi
Scientific Reports, 15 (1)
[32] Blind Image Restoration Based on l1 - l2 Blur Regularization
Xiao, Su
ENGINEERING LETTERS, 2020, 28 (01) : 148 - 154
[33] An Improved Variable Kernel Density Estimator Based on L2 Regularization
Jin, Yi
He, Yulin
Huang, Defa
MATHEMATICS, 2021, 9 (16)
[34] Weighted Multiview K-Means Clustering with L2 Regularization
Hussain, Ishtiaq
Nataliani, Yessica
Ali, Mehboob
Hussain, Atif
Mujlid, Hana M.
Almaliki, Faris A.
Rahimi, Nouf M.
SYMMETRY-BASEL, 2024, 16 (12):
[35] A Hidden Feature Selection Method based on l2,0-Norm Regularization for Training Single-hidden-layer Neural Networks
Liu, Zhiwei
Yu, Yuanlong
Sun, Zhenzhen
2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 1810 - 1817
[36] Batch Normalization and Dropout Regularization in Training Deep Neural Networks with Label Noise
Rusiecki, Andrzej
INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021, 2022, 418 : 57 - 66
[37] ISING-DROPOUT: A REGULARIZATION METHOD FOR TRAINING AND COMPRESSION OF DEEP NEURAL NETWORKS
Salehinejad, Hojjat
Valaee, Shahrokh
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3602 - 3606
[38] Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows
Yoo, Gene Ryan
Owhadi, Houman
PHYSICA D-NONLINEAR PHENOMENA, 2021, 426
[39] Differential impacts of natural L2 immersion and intensive classroom L2 training on cognitive control
Xie, Zhilong
Antolovic, Katarina
QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2022, 75 (03) : 550 - 562
[40] Batch gradient method with smoothing L1/2 regularization for training of feedforward neural networks
Wu, Wei
Fan, Qinwei
Zurada, Jacek M.
Wang, Jian
Yang, Dakun
Liu, Yan
NEURAL NETWORKS, 2014, 50 : 72 - 78

← 1 2 3 4 5 →