Learning deep hierarchical and temporal recurrent neural networks with residual learning

被引：1

作者：

Tehseen Zia

Assad Abbas

Usman Habib

Muhammad Sajid Khan

机构：

[1] COMSATS University Islamabad,Department of Computer Science

[2] National University of Computer and Emerging Sciences (FAST-NUCES),College of Computer Science

[3] Sichuan University,undefined

来源：

International Journal of Machine Learning and Cybernetics | 2020年 / 11卷

关键词：

Deep learning; Recurrent neural networks; Residual learning; Long-short term memory; Sequence modeling;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Learning both hierarchical and temporal dependencies can be crucial for recurrent neural networks (RNNs) to deeply understand sequences. To this end, a unified RNN framework is required that can ease the learning of both the deep hierarchical and temporal structures by allowing gradients to propagate back from both ends without being vanished. The residual learning (RL) has appeared as an effective and less-costly method to facilitate backward propagation of gradients. The significance of the RL is exclusively shown for learning deep hierarchical representations and temporal dependencies. Nevertheless, there is lack of efforts to unify these finding into a single framework for learning deep RNNs. In this study, we aim to prove that approximating identity mapping is crucial for optimizing both hierarchical and temporal structures. We propose a framework called hierarchical and temporal residual RNNs, to learn RNNs by approximating identity mappings across hierarchical and temporal structures. To validate the proposed method, we explore the efficacy of employing shortcut connections for training deep RNNs structures for sequence learning problems. Experiments are performed on Penn Treebank, Hutter Prize and IAM-OnDB datasets and results demonstrate the utility of the framework in terms of accuracy and computational complexity. We demonstrate that even for large datasets exploiting parameters for increasing network depth can gain computational benefits with reduced size of the RNN "state".

引用

页码：873 / 882

页数：9

共 28 条

[1] Bengio Y(2009)Learning deep architectures for AI Found Trends Mach Learn 2 1-127
[2] LeCun Y(2015)Deep learning Nature 521 436-444
[3] Bengio Y(2010)Recurrent neural network based language model Interspeech 2 3-2629
[4] Hinton G(2019)Lossless compression for hyperspectral image using deep recurrent neural networks Int J Mach Learn Cybern 10 2619-242
[5] Mikolov T(1992)Learning complex, extended sequences using the principle of history compression Neural Comput 4 234-1780
[6] Karafiát M(1997)Long short-term memory Neural Comput 9 1735-2232
[7] Burget L(2016)LSTM: a search space odyssey IEEE Trans Neural Netw Learn Syst 28 2222-15
[8] Cernocký J(2013)Selective recurrent neural network Neural Process Lett 38 1-76
[9] Khudanpur S(2019)Hierarchical recurrent highway networks Pattern Recogn Lett 119 71-1560
[10] Luo J(1990)Backpropagation through time: what it does and how to do it Proc IEEE 78 1550-330

← 1 2 3 →