aSNAQ: An adaptive stochastic Nesterov's accelerated quasi-Newton method for training RNNs

被引：0

作者：

Sendilkkumaar, Indrapriyadarsini ^{[1
]}

Mahboubi, Shahrzad ^{[2
]}

Ninomiya, Hiroshi ^{[2
]}

Asai, Hideki ^{[3
]}

机构：

[1] Shizuoka Univ, Grad Sch Sci & Technol, Naka Ku, 3-5-1 Johoku, Hamamatsu, Shizuoka 4328561, Japan

[2] Shonan Inst Technol, Grad Sch Elect & Informat Engn, 1-1-25 Tsujido Nishikaigan, Fujisawa, Kanagawa 2518511, Japan

[3] Shizuoka Univ, Res Inst Elect, Naka Ku, 3-5-1 Johoku, Hamamatsu, Shizuoka 4328561, Japan

来源：

IEICE NONLINEAR THEORY AND ITS APPLICATIONS | 2020年 / 11卷 / 04期

关键词：

Recurrent neural network; training algorithm; Nesterov's accelerated quasi-Newton; stochastic method; Tensorflow; OPTIMIZATION;

D O I：

10.1587/nolta.11.409

中图分类号：

O1 [数学];

学科分类号：

0701 ; 070101 ;

摘要：

Recurrent Neural Networks (RNNs) are powerful sequence models that are particularly difficult to train. This paper proposes an adaptive stochastic Nesterov's accelerated quasi-Newton (aSNAQ) method for training RNNs. Several algorithms have been proposed earlier for training RNNs. However, due to high computational complexity, very few methods use second-order curvature information despite its ability to improve convergence. The proposed method is an accelerated second-order method that attempts to incorporate curvature information while maintaining a low per iteration cost. Furthermore, direction normalization has been introduced to solve the vanishing and/or exploding gradient problem that is prominent in training RNNs. The performance of the proposed method is evaluated in Tensorflow on benchmark sequence modeling problems. The results show that the proposed aSNAQ method is effective in training RNNs with a low per-iteration cost and improved performance compared to the second-order adaQN and first-order Adagrad and Adam methods.

引用

页码：409 / 421

页数：13

共 30 条

[1]

[Anonymous], 2010, MNIST handwritten digit database

[2]

[Anonymous], 2013, THESIS

[3]

[Anonymous], 2011, P INT C MACH LEARN

[4]

[Anonymous], 2011, P INT C MACH LEARN, DOI DOI 10.5555/3104482.3104610

[5]

[Anonymous], Way to Initialize Recurrent Networks of Rectified Linear Units

[6]

[Anonymous], 2012, On the Difficulty of Training Recurrent Neural Networks, DOI DOI 10.48550/ARXIV.1211.5063

[7]

Bahdanau D, 2014, 3 INT C LEARN REPR

[8] A STOCHASTIC QUASI-NEWTON METHOD FOR LARGE-SCALE OPTIMIZATION [J].

Byrd, R. H. ;

Hansen, S. L. ;

Nocedal, Jorge ;

Singer, Y. .

SIAM JOURNAL ON OPTIMIZATION, 2016, 26 (02) :1008-1031

[9]

Cho K., 2014, C EMP METH NAT LANG, P1724, DOI [10.3115/v1/d14-1179, DOI 10.3115/V1/D14-1179]

[10]

Duchi J, 2011, J MACH LEARN RES, V12, P2121

← 1 2 3 →