Training Deep Neural Networks Using Conjugate Gradient-like Methods

被引:6
作者
Iiduka, Hideaki [1 ]
Kobayashi, Yu [1 ]
机构
[1] Meiji Univ, Dept Comp Sci, Tama Ku, 1-1-1 Higashimita, Kawasaki, Kanagawa 2148571, Japan
关键词
adaptive learning rate optimization algorithms; conjugate gradient-like method; deep neural network; nonconvex optimization; CONVEX-OPTIMIZATION PROBLEM; FIXED-POINT SET; CLASSIFICATION;
D O I
10.3390/electronics9111809
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The goal of this article is to train deep neural networks that accelerate useful adaptive learning rate optimization algorithms such as AdaGrad, RMSProp, Adam, and AMSGrad. To reach this goal, we devise an iterative algorithm combining the existing adaptive learning rate optimization algorithms with conjugate gradient-like methods, which are useful for constrained optimization. Convergence analyses show that the proposed algorithm with a small constant learning rate approximates a stationary point of a nonconvex optimization problem in deep learning. Furthermore, it is shown that the proposed algorithm with diminishing learning rates converges to a stationary point of the nonconvex optimization problem. The convergence and performance of the algorithm are demonstrated through numerical comparisons with the existing adaptive learning rate optimization algorithms for image and text classification. The numerical results show that the proposed algorithm with a constant learning rate is superior for training neural networks.
引用
收藏
页码:1 / 25
页数:25
相关论文
共 22 条
[1]  
[Anonymous], P 19 IMEKO WORLD C F
[2]  
[Anonymous], 2013, 30 INT C MACH LEARN
[3]  
Bauschke HH, 2011, CMS BOOKS MATH, P1, DOI 10.1007/978-1-4419-9467-7
[4]   Flavors mapping by Kohonen network classification of Panel Tests of Extra Virgin Olive Oil [J].
Caciotta, Maurizio ;
Giarnetti, Sabino ;
Leccese, Fabio ;
Orioni, Barbara ;
Oreggia, Marco ;
Pucci, Carlotta ;
Rametta, Salvatore .
MEASUREMENT, 2016, 78 :366-372
[5]  
Duchi J, 2011, J MACH LEARN RES, V12, P2121
[6]  
Facchinei F., 2002, Finite -Dimensional Variational Inequalities and Complementarity Problems
[7]  
Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1
[8]  
Hager W.W, 2006, PAC J OPTIM, V1, P35
[9]   Co-teaching: Robust Training of Deep Neural Networks with Extremely Noisy Labels [J].
Han, Bo ;
Yao, Quanming ;
Yu, Xingrui ;
Niu, Gang ;
Xu, Miao ;
Hu, Weihua ;
Tsang, Ivor W. ;
Sugiyama, Masashi .
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[10]  
Horn R.A., 1986, Matrix Analysis