Minimal gated unit for recurrent neural networks

被引:197
作者
Zhou G.-B. [1 ]
Wu J. [1 ]
Zhang C.-L. [1 ]
Zhou Z.-H. [1 ]
机构
[1] National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing
关键词
deep learning; gate recurrent unit (GRU); gated unit; long short-term memory (LSTM); minimal gated unit (MGU); Recurrent neural network;
D O I
10.1007/s11633-016-1006-2
中图分类号
学科分类号
摘要
Recurrent neural networks (RNN) have been very successful in handling sequence data. However, understanding RNN and finding the best practices for RNN learning is a difficult task, partly because there are many competing and complex hidden units, such as the long short-term memory (LSTM) and the gated recurrent unit (GRU). We propose a gated unit for RNN, named as minimal gated unit (MGU), since it only contains one gate, which is a minimal design among all gated hidden units. The design of MGU benefits from evaluation results on LSTM and GRU in the literature. Experiments on various sequence data show that MGU has comparable accuracy with GRU, but has a simpler structure, fewer parameters, and faster training. Hence, MGU is suitable in RNN's applications. Its simple architecture also means that it is easier to evaluate and tune, and in principle it is easier to study MGU's properties theoretically and empirically. © 2016, Institute of Automation, Chinese Academy of Sciences and Springer-Verlag Berlin Heidelberg.
引用
收藏
页码:226 / 234
页数:8
相关论文
共 27 条
[1]  
LeCun Y., Bottou L., Bengio Y., Haffner P., Gradient-based learning applied to document recognition, Proceedings of the IEEE, 86, 11, pp. 2278-2324, (1998)
[2]  
Krizhevsky A., Sutskever I., Hinton G.E., ImageNet classification with deep convolutional neural networks, Proceedings of Advances in Neural Information Processing Systems 25, pp. 1097-1105, (2012)
[3]  
Cho K., van Merienboer B., Gulcehre C., Bahdanau D., Bougares F., Schwenk H., Bengio Y., Learning phrase representations using RNN encoder-decoder for statistical machine translation, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, pp. 1724-1734, (2014)
[4]  
Sutskever I., Vinyals O., Le Q.V., Sequence to sequence learning with neural networks, Proceedings of Advances in Neural Information Processing Systems 27, pp. 3104-3112, (2014)
[5]  
Bahdanau D., Cho K., Bengio Y., Neural machine translation by jointly learning to align and translate, International Conference on Learning Representations 2015, (2015)
[6]  
Graves A., Mohamed A.R., Hinton G., Speech recognition with deep recurrent neural networks, Proceedings of International Conference on Acoustics, Speech and Signal Processing, pp. 6645-6649, (2013)
[7]  
Xu K., Ba J.L., Kiros R., Cho K., Courville A., Salakhudinov R., Zemel R.S., Bengio Y., Show, attend and tell: Neural image caption generation with visual attention, Proceedings of the 32nd International Conference on Machine Learning, 37, pp. 2048-2057, (2015)
[8]  
Karpathy A., Li F.F., Deep visual-semantic alignments for generating image descriptions, Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 3128-3137, (2015)
[9]  
Lebret R., Pinheiro P.O., Collobert R., Phrase-based image captioning, Proceedings of the 32nd International Conference on Machine Learning, 37, pp. 2085-2094, (2015)
[10]  
Donahue J., Hendricks L.A., Guadarrama S., Rohrbach M., Venugopalan S., Saenko K., Darrell T., Long-term recurrent convolutional networks for visual recognition and description, Proceedings of IEEE International Conference on Computer Vision and Pattern Recognition, pp. 2625-2634, (2015)