On the training dynamics of deep networks with L2 regularization

被引:0
作者
Lewkowycz, Aitor [1 ]
Gur-Ari, Guy [1 ]
机构
[1] Google, Mountain View, CA 94043 USA
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020 | 2020年 / 33卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the role of L-2 regularization in deep learning, and uncover simple relations between the performance of the model, the L-2 coefficient, the learning rate, and the number of training steps. These empirical relations hold when the network is overparameterized. They can be used to predict the optimal regularization parameter of a given model. In addition, based on these observations we propose a dynamical schedule for the regularization parameter that improves performance and speeds up training. We test these proposals in modern image classification settings. Finally, we show that these empirical relations can be understood theoretically in the context of infinitely wide networks. We derive the gradient flow dynamics of such networks, and compare the role of L-2 regularization in this context with that of linear models.
引用
收藏
页数:10
相关论文
共 50 条
  • [31] Enhancing relative humidity modelling using L2 regularization updates
    Abdellah Ben Yahia
    Iman Kadir
    Abdelaziz Abdallaoui
    Abdellah El-Hmaidi
    Scientific Reports, 15 (1)
  • [32] Blind Image Restoration Based on l1 - l2 Blur Regularization
    Xiao, Su
    ENGINEERING LETTERS, 2020, 28 (01) : 148 - 154
  • [33] An Improved Variable Kernel Density Estimator Based on L2 Regularization
    Jin, Yi
    He, Yulin
    Huang, Defa
    MATHEMATICS, 2021, 9 (16)
  • [34] Weighted Multiview K-Means Clustering with L2 Regularization
    Hussain, Ishtiaq
    Nataliani, Yessica
    Ali, Mehboob
    Hussain, Atif
    Mujlid, Hana M.
    Almaliki, Faris A.
    Rahimi, Nouf M.
    SYMMETRY-BASEL, 2024, 16 (12):
  • [35] A Hidden Feature Selection Method based on l2,0-Norm Regularization for Training Single-hidden-layer Neural Networks
    Liu, Zhiwei
    Yu, Yuanlong
    Sun, Zhenzhen
    2019 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI 2019), 2019, : 1810 - 1817
  • [36] Batch Normalization and Dropout Regularization in Training Deep Neural Networks with Label Noise
    Rusiecki, Andrzej
    INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021, 2022, 418 : 57 - 66
  • [37] ISING-DROPOUT: A REGULARIZATION METHOD FOR TRAINING AND COMPRESSION OF DEEP NEURAL NETWORKS
    Salehinejad, Hojjat
    Valaee, Shahrokh
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 3602 - 3606
  • [38] Deep regularization and direct training of the inner layers of Neural Networks with Kernel Flows
    Yoo, Gene Ryan
    Owhadi, Houman
    PHYSICA D-NONLINEAR PHENOMENA, 2021, 426
  • [39] Differential impacts of natural L2 immersion and intensive classroom L2 training on cognitive control
    Xie, Zhilong
    Antolovic, Katarina
    QUARTERLY JOURNAL OF EXPERIMENTAL PSYCHOLOGY, 2022, 75 (03) : 550 - 562
  • [40] Batch gradient method with smoothing L1/2 regularization for training of feedforward neural networks
    Wu, Wei
    Fan, Qinwei
    Zurada, Jacek M.
    Wang, Jian
    Yang, Dakun
    Liu, Yan
    NEURAL NETWORKS, 2014, 50 : 72 - 78