Learning dynamics of gradient descent optimization in deep neural networks

被引:19
|
作者
Wu, Wei [1 ]
Jing, Xiaoyuan [1 ]
Du, Wencai [2 ]
Chen, Guoliang [3 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] City Univ Macau, Inst Data Sci, Macau 999078, Peoples R China
[3] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
基金
中国国家自然科学基金;
关键词
learning dynamics; deep neural networks; gradient descent; control model; transfer function;
D O I
10.1007/s11432-020-3163-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stochastic gradient descent (SGD)-based optimizers play a key role in most deep learning models, yet the learning dynamics of the complex model remain obscure. SGD is the basic tool to optimize model parameters, and is improved in many derived forms including SGD momentum and Nesterov accelerated gradient (NAG). However, the learning dynamics of optimizer parameters have seldom been studied. We propose to understand the model dynamics from the perspective of control theory. We use the status transfer function to approximate parameter dynamics for different optimizers as the first- or second-order control system, thus explaining how the parameters theoretically affect the stability and convergence time of deep learning models, and verify our findings by numerical experiments.
引用
收藏
页数:15
相关论文
共 50 条
  • [1] Learning dynamics of gradient descent optimization in deep neural networks
    Wei Wu
    Xiaoyuan Jing
    Wencai Du
    Guoliang Chen
    Science China Information Sciences, 2021, 64
  • [2] Learning dynamics of gradient descent optimization in deep neural networks
    Wei WU
    Xiaoyuan JING
    Wencai DU
    Guoliang CHEN
    ScienceChina(InformationSciences), 2021, 64 (05) : 17 - 31
  • [3] Evolutionary Stochastic Gradient Descent for Optimization of Deep Neural Networks
    Cui, Xiaodong
    Zhang, Wei
    Tuske, Zoltan
    Picheny, Michael
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [4] Strengthening Gradient Descent by Sequential Motion Optimization for Deep Neural Networks
    Le-Duc, Thang
    Nguyen, Quoc-Hung
    Lee, Jaehong
    Nguyen-Xuan, H.
    IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2023, 27 (03) : 565 - 579
  • [5] Dynamics of on-line gradient descent learning for multilayer neural networks
    Saad, D
    Solla, SA
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 8: PROCEEDINGS OF THE 1995 CONFERENCE, 1996, 8 : 302 - 308
  • [6] The Dynamics of Gradient Descent for Overparametrized Neural Networks
    Satpathi, Siddhartha
    Srikant, R.
    LEARNING FOR DYNAMICS AND CONTROL, VOL 144, 2021, 144
  • [7] Optimization of Graph Neural Networks with Natural Gradient Descent
    Izadi, Mohammad Rasool
    Fang, Yihao
    Stevenson, Robert
    Lin, Lizhen
    2020 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2020, : 171 - 179
  • [8] Learning Deep Gradient Descent Optimization for Image Deconvolution
    Gong, Dong
    Zhang, Zhen
    Shi, Qinfeng
    van den Hengel, Anton
    Shen, Chunhua
    Zhang, Yanning
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2020, 31 (12) : 5468 - 5482
  • [9] Impact of Mathematical Norms on Convergence of Gradient Descent Algorithms for Deep Neural Networks Learning
    Cai, Linzhe
    Yu, Xinghuo
    Li, Chaojie
    Eberhard, Andrew
    Lien Thuy Nguyen
    Chuong Thai Doan
    AI 2022: ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, 13728 : 131 - 144
  • [10] Learning Graph Neural Networks with Approximate Gradient Descent
    Li, Qunwei
    Zou, Shaofeng
    Zhong, Wenliang
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8438 - 8446