The vanishing gradient problem during learning recurrent neural nets and problem solutions

被引：1803

作者：

Hochreiter, S ^{[1
]}

机构：

[1] Tech Univ Munich, Inst Informat, D-80290 Munchen, Germany

来源：

INTERNATIONAL JOURNAL OF UNCERTAINTY FUZZINESS AND KNOWLEDGE-BASED SYSTEMS | 1998年 / 6卷 / 02期

关键词：

recurrent neural nets; vanishing gradient; long-term dependencies; Long Short-Term Memory;

D O I：

10.1142/S0218488598000094

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Recurrent nets are in principle capable to store past inputs to produce the currently desired output. Because of this property recurrent nets are used in time series prediction and process control. Practical applications involve temporal dependencies spanning many time steps, e.g. between relevant inputs and desired outputs. In this case, however, gradient based learning methods take too much time. The extremely increased learning time arises because the error vanishes as it gets propagated back. In this article the decaying error flow is theoretically analyzed. Then methods trying to overcome vanishing gradients are briefly discussed. Finally, experiments comparing conventional algorithms and alternative methods are presented. With advanced methods long time lag problems can be solved in reasonable time.

引用

页码：107 / 116

页数：10

共 30 条

[1]

[Anonymous], 1993, P ADV NEUR INF PROC

[2]

[Anonymous], 1989, NUCCS8927

[3]

[Anonymous], 1993, Advances in Neural Information Processing Systems

[4]

[Anonymous], ADVANCES IN NEURAL I

[5]

[Anonymous], 1991, Advances in Neural Information Processing Systems

[6] LEARNING LONG-TERM DEPENDENCIES WITH GRADIENT DESCENT IS DIFFICULT [J].

BENGIO, Y ;

SIMARD, P ;

FRASCONI, P .

IEEE TRANSACTIONS ON NEURAL NETWORKS, 1994, 5 (02) :157-166

[7]

BENGIO Y, 1994, ADV NEURAL INFORMATI, V6, P75

[8] Finite State Automata and Simple Recurrent Networks [J].

Cleeremans, Axel ;

Servan-Schreiber, David ;

McClelland, James L. .

NEURAL COMPUTATION, 1989, 1 (03) :372-381

[9] NONLINEAR HIGHER-ORDER STATISTICAL DECORRELATION BY VOLUME-CONSERVING NEURAL ARCHITECTURES [J].

DECO, G ;

BRAUER, W .

NEURAL NETWORKS, 1995, 8 (04) :525-535

[10]

Elman J.L., 1988, 8801 CRL U CAL

← 1 2 3 →