Agreement on target-bidirectional recurrent neural networks for sequence-to-sequence learning

被引:0
作者
Liu L. [1 ]
Finch A. [1 ]
Utiyama M. [1 ]
Sumita E. [1 ]
机构
[1] National Institute of Information and Communications Technology, 3-5 Hikari-dai, Seika-cho, Soraku-gun, Kyoto
关键词
52;
D O I
10.1613/JAIR.1.12008
中图分类号
学科分类号
摘要
Recurrent neural networks are extremely appealing for sequence-to-sequence learning tasks. Despite their great success, they typically suffer from a shortcoming: they are prone to generate unbalanced targets with good prefixes but bad suffixes, and thus performance suffers when dealing with long sequences. We propose a simple yet effective approach to overcome this shortcoming. Our approach relies on the agreement between a pair of target-directional RNNs, which generates more balanced targets. In addition, we develop two efficient approximate search methods for agreement that are empirically shown to be almost optimal in terms of either sequence level or non-sequence level metrics. Extensive experiments were performed on three standard sequence-to-sequence transduction tasks: machine transliteration, grapheme-to-phoneme transformation and machine translation. The results show that the proposed approach achieves consistent and substantial improvements, compared to many state-of-the-art systems. © 2020 AI Access Foundation. All rights reserved.
引用
收藏
页码:581 / 606
页数:25
相关论文
共 52 条
  • [41] Sundermeyer M., Alkhouli T., Wuebker J., Ney H., Translation modeling with bidirectional recurrent neural networks, Proceedings of EMNLP, (2014)
  • [42] Sutskever I., Vinyals O., Le Q. V. V., Sequence to sequence learning with neural networks, NIPS, (2014)
  • [43] Tai K. S., Socher R., Manning C. D., Improved semantic representations from tree-structured long short-term memory networks, Proceedings of ACL, (2015)
  • [44] Tamura A., Watanabe T., Sumita E., Recurrent neural networks for word alignment model, Proceedings of ACL, (2014)
  • [45] Watanabe T., Sumita E., Bidirectional decoding for statistical machine translation, Proceeding of COLING, (2002)
  • [46] Watanabe T., Sumita E., Transition-based neural constituent parsing, Proceedings of ACL, (2015)
  • [47] Wu K., Allauzen C., Hall K. B., Riley M., Roark B., Encoding linear models as weighted finite-state transducers, INTERSPEECH, (2014)
  • [48] Yao K., Zweig G., Sequence-to-sequence neural net models for grapheme-to-phoneme conversion, (2015)
  • [49] Zeiler M. D., ADADELTA: an adaptive learning rate method, (2012)
  • [50] Zhang H., Toutanova K., Quirk C., Gao J., Beyond left-to-right: Multiple decomposition structures for smt, HLT-NAACL, pp. 12-21, (2013)