Learning dynamics of gradient descent optimization in deep neural networks

被引:21
作者
Wu, Wei [1 ]
Jing, Xiaoyuan [1 ]
Du, Wencai [2 ]
Chen, Guoliang [3 ]
机构
[1] Wuhan Univ, Sch Comp Sci, Wuhan 430072, Peoples R China
[2] City Univ Macau, Inst Data Sci, Macau 999078, Peoples R China
[3] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen 518060, Peoples R China
基金
中国国家自然科学基金;
关键词
learning dynamics; deep neural networks; gradient descent; control model; transfer function;
D O I
10.1007/s11432-020-3163-0
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Stochastic gradient descent (SGD)-based optimizers play a key role in most deep learning models, yet the learning dynamics of the complex model remain obscure. SGD is the basic tool to optimize model parameters, and is improved in many derived forms including SGD momentum and Nesterov accelerated gradient (NAG). However, the learning dynamics of optimizer parameters have seldom been studied. We propose to understand the model dynamics from the perspective of control theory. We use the status transfer function to approximate parameter dynamics for different optimizers as the first- or second-order control system, thus explaining how the parameters theoretically affect the stability and convergence time of deep learning models, and verify our findings by numerical experiments.
引用
收藏
页数:15
相关论文
共 23 条
[1]   Singularities affect dynamics of learning in neuromanifolds [J].
Amari, Shun-ichi ;
Park, Hyeyoung ;
Ozeki, Tomoko .
NEURAL COMPUTATION, 2006, 18 (05) :1007-1065
[2]   A PID Controller Approach for Stochastic Optimization of Deep Networks [J].
An, Wangpeng ;
Wang, Haoqian ;
Sun, Qingyun ;
Xu, Jun ;
Dai, Qionghai ;
Zhang, Lei .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :8522-8531
[3]  
[Anonymous], ARXIV12125701
[4]  
Bietti A, 2019, J MACH LEARN RES, V20
[5]   Deep relaxation: partial differential equations for optimizing deep neural networks [J].
Chaudhari, Pratik ;
Oberman, Adam ;
Osher, Stanley ;
Soatto, Stefano ;
Carlier, Guillaume .
RESEARCH IN THE MATHEMATICAL SCIENCES, 2018, 5 :1-30
[6]   Dynamics of learning in multilayer perceptrons near singularities [J].
Cousseau, Florent ;
Ozeki, Tomoko ;
Amari, Shun-ichi .
IEEE TRANSACTIONS ON NEURAL NETWORKS, 2008, 19 (08) :1313-1328
[7]  
Dauphin Y N, 2015, P C WORKSH NEUR INF
[8]  
Duchi J, 2011, J MACH LEARN RES, V12, P2121
[9]   A Novel Criterion for Global Asymptotic Stability of Neutral-Type Neural Networks with Discrete Time Delays [J].
Faydasicok, Ozlem ;
Arik, Sabri .
NEURAL INFORMATION PROCESSING (ICONIP 2018), PT II, 2018, 11302 :353-360
[10]  
Hinton Geoffrey E., 2012, Overview of minibatch gradient descent