Learning dynamics of gradient descent optimization in deep neural networks

被引:19
作者
Wu, Wei [1 ]
Jing, Xiaoyuan [1 ]
Du, Wencai [2 ]
Chen, Guoliang [3 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] City Univ Macau, Inst Data Sci, Macau 999078, Peoples R China
[3] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
基金
中国国家自然科学基金;
关键词
learning dynamics; deep neural networks; gradient descent; control model; transfer function;
D O I
10.1007/s11432-020-3163-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stochastic gradient descent (SGD)-based optimizers play a key role in most deep learning models, yet the learning dynamics of the complex model remain obscure. SGD is the basic tool to optimize model parameters, and is improved in many derived forms including SGD momentum and Nesterov accelerated gradient (NAG). However, the learning dynamics of optimizer parameters have seldom been studied. We propose to understand the model dynamics from the perspective of control theory. We use the status transfer function to approximate parameter dynamics for different optimizers as the first- or second-order control system, thus explaining how the parameters theoretically affect the stability and convergence time of deep learning models, and verify our findings by numerical experiments.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Learning dynamics of gradient descent optimization in deep neural networks
    Wei Wu
    Xiaoyuan Jing
    Wencai Du
    Guoliang Chen
    Science China Information Sciences, 2021, 64
  • [2] The Dynamics of Gradient Descent for Overparametrized Neural Networks
    Satpathi, Siddhartha
    Srikant, R.
    LEARNING FOR DYNAMICS AND CONTROL, VOL 144, 2021, 144
  • [3] Convergence of gradient descent for learning linear neural networks
    Nguegnang, Gabin Maxime
    Rauhut, Holger
    Terstiege, Ulrich
    ADVANCES IN CONTINUOUS AND DISCRETE MODELS, 2024, 2024 (01):
  • [4] Gradient Descent Analysis: On Visualizing the Training of Deep Neural Networks
    Becker, Martin
    Lippel, Jens
    Zielke, Thomas
    PROCEEDINGS OF THE 14TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS - VOL 3: IVAPP, 2019, : 338 - 345
  • [5] Blended coarse gradient descent for full quantization of deep neural networks
    Yin, Penghang
    Zhang, Shuai
    Lyu, Jiancheng
    Osher, Stanley
    Qi, Yingyong
    Xin, Jack
    RESEARCH IN THE MATHEMATICAL SCIENCES, 2019, 6 (1)
  • [6] Blended coarse gradient descent for full quantization of deep neural networks
    Penghang Yin
    Shuai Zhang
    Jiancheng Lyu
    Stanley Osher
    Yingyong Qi
    Jack Xin
    Research in the Mathematical Sciences, 2019, 6
  • [7] Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks
    Vasudevan, Shrihari
    ENTROPY, 2020, 22 (05)
  • [8] Annealed gradient descent for deep learning
    Pan, Hengyue
    Niu, Xin
    Li, RongChun
    Dou, Yong
    Jiang, Hui
    NEUROCOMPUTING, 2020, 380 (380) : 201 - 211
  • [9] Non-convergence of stochastic gradient descent in the training of deep neural networks
    Cheridito, Patrick
    Jentzen, Arnulf
    Rossmannek, Florian
    JOURNAL OF COMPLEXITY, 2021, 64
  • [10] LEARNING SHALLOW NEURAL NETWORKS VIA PROVABLE GRADIENT DESCENT WITH RANDOM INITIALIZATION
    Xia, Shuhao
    Shi, Yuanming
    2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 5616 - 5620