Agreement on target-bidirectional recurrent neural networks for sequence-to-sequence learning

被引：0

作者：

Liu L. ^{[1
]}

Finch A. ^{[1
]}

Utiyama M. ^{[1
]}

Sumita E. ^{[1
]}

机构：

[1] National Institute of Information and Communications Technology, 3-5 Hikari-dai, Seika-cho, Soraku-gun, Kyoto

来源：

Journal of Artificial Intelligence Research | 2020年 / 67卷

关键词：

52;

D O I：

10.1613/JAIR.1.12008

中图分类号：

学科分类号：

摘要：

Recurrent neural networks are extremely appealing for sequence-to-sequence learning tasks. Despite their great success, they typically suffer from a shortcoming: they are prone to generate unbalanced targets with good prefixes but bad suffixes, and thus performance suffers when dealing with long sequences. We propose a simple yet effective approach to overcome this shortcoming. Our approach relies on the agreement between a pair of target-directional RNNs, which generates more balanced targets. In addition, we develop two efficient approximate search methods for agreement that are empirically shown to be almost optimal in terms of either sequence level or non-sequence level metrics. Extensive experiments were performed on three standard sequence-to-sequence transduction tasks: machine transliteration, grapheme-to-phoneme transformation and machine translation. The results show that the proposed approach achieves consistent and substantial improvements, compared to many state-of-the-art systems. © 2020 AI Access Foundation. All rights reserved.

引用

页码：581 / 606

页数：25

共 52 条

[1] Bahdanau D., Cho K., Bengio Y., Neural machine translation by jointly learning to align and translate, Proceedings of ICLR, (2015)
[2] Bengio S., Vinyals O., Jaitly N., Shazeer N., Scheduled sampling for sequence prediction with recurrent neural networks, Advances in Neural Information Processing Systems, pp. 1171-1179, (2015)
[3] Bergstra J., Breuleux O., Bastien F., Lamblin P., Pascanu R., Desjardins G., Turian J., Warde-Farley D., Bengio Y., Theano: a CPU and GPU math expression compiler, Proceedings of the Python for Scientific Computing Conference (SciPy), (2010)
[4] Bisani M., Ney H., Joint-sequence models for grapheme-to-phoneme conversion, Speech Commun, (2008)
[5] Chiang D., A hierarchical phrase-based model for statistical machine translation, Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL'05), pp. 263-270, (2005)
[6] Cho K., Van Merrienboer B., Gulcehre C., Bahdanau D., Bougares F., Schwenk H., Bengio Y., Learning phrase representations using rnn encoder-decoder for statistical machine translation, (2014)
[7] Clark J. H., Dyer C., Lavie A., Smith N. A., Better hypothesis testing for statistical machine translation: Controlling for optimizer instability, Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies: short papers, 2, pp. 176-181, (2011)
[8] Collins M., Roark B., Incremental parsing with the perceptron algorithm, Proceedings of ACL, (2004)
[9] Devlin J., Zbib R., Huang Z., Lamar T., Schwartz R., Makhoul J., Fast and robust neural network joint models for statistical machine translation, Proceedings of ACL, (2014)
[10] Dyer C., Ballesteros M., Ling W., Matthews A., Smith N. A., Transition-based dependency parsing with stack long short-term memory, Proceedings of ACL, (2015)

← 1 2 3 4 5 6 →