Attention over Heads: A Multi-Hop Attention for Neural Machine Translation

被引:0
|
作者
Iida, Shohei [1 ]
Kimura, Ryuichiro [1 ]
Cui, Hongyi [1 ]
Hung, Po-Hsuan [1 ]
Utsuro, Takehito [1 ]
Nagata, Masaaki [2 ]
机构
[1] Univ Tsukuba, Grad Sch Syst & Informat Engn, Tsukuba, Ibaraki, Japan
[2] NTT Corp, NTT Commun Sci Labs, Tokyo, Japan
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we propose a multi-hop attention for the Transformer. It refines the attention for an output symbol by integrating that of each head, and consists of two hops. The first hop attention is the scaled dot-product attention which is the same attention mechanism used in the original Transformer. The second hop attention is a combination of multi-layer perceptron (MLP) attention and head gate, which efficiently increases the complexity of the model by adding dependencies between heads. We demonstrate that the translation accuracy of the proposed multi-hop attention outperforms the baseline Transformer significantly, +0.85 BLEU point for the IWSLT-2017 German-to-English task and +2.58 BLEU point for the WMT-2017 German-to-English task. We also find that the number of parameters required for a multi-hop attention is smaller than that for stacking another self-attention layer and the proposed model converges significantly faster than the original Transformer.
引用
收藏
页码:217 / 222
页数:6
相关论文
共 50 条
  • [41] Attention-aware Multi-hop Trust Inference in Online Social Networks
    Xu, Rongwei
    Liu, Guanfeng
    Hua, Xianmei
    Ye, Shiqi
    Zhang, Xuyun
    Lu, Junwen
    2022 IEEE 9TH INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA), 2022, : 804 - 813
  • [42] A Multi-Hop Attention Deep Model for Aspect-Level Sentiment Classification
    Deng Y.
    Lei H.
    Li X.-Y.
    Lin Y.-O.
    Dianzi Keji Daxue Xuebao/Journal of the University of Electronic Science and Technology of China, 2019, 48 (05): : 759 - 766
  • [43] ConvHiA: convolutional network with hierarchical attention for knowledge graph multi-hop reasoning
    Dengao Li
    Shuyi Miao
    Baofeng Zhao
    Yu Zhou
    Ding Feng
    Jumin Zhao
    Xupeng Niu
    International Journal of Machine Learning and Cybernetics, 2023, 14 : 2301 - 2315
  • [44] HiAM: A Hierarchical Attention based Model for knowledge graph multi-hop reasoning
    Ma, Ting
    Lv, Shangwen
    Huang, Longtao
    Hu, Songlin
    NEURAL NETWORKS, 2021, 143 : 261 - 270
  • [45] Multi-hop Relational Graph Attention Network for Text-to-SQL Parsing
    Liu, Hu
    Shi, Yuliang
    Zhang, Jianlin
    Wang, Xinjun
    Li, Hui
    Kong, Fanyu
    2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN, 2023,
  • [46] Multi-Modal Alignment of Visual Question Answering Based on Multi-Hop Attention Mechanism
    Xia, Qihao
    Yu, Chao
    Hou, Yinong
    Peng, Pingping
    Zheng, Zhengqi
    Chen, Wen
    ELECTRONICS, 2022, 11 (11)
  • [47] Look Harder: A Neural Machine Translation Model with Hard Attention
    Indurthi, Sathish
    Chung, Insoo
    Kim, Sangha
    57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), 2019, : 3037 - 3043
  • [48] Training Deeper Neural Machine Translation Models with Transparent Attention
    Bapna, Ankur
    Chen, Mia Xu
    Firat, Orhan
    Cao, Yuan
    Wu, Yonghui
    2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), 2018, : 3028 - 3033
  • [49] Recursive Annotations for Attention-Based Neural Machine Translation
    Ye, Shaolin
    Guo, Wu
    2017 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2017, : 164 - 167
  • [50] Fine-grained attention mechanism for neural machine translation
    Choi, Heeyoul
    Cho, Kyunghyun
    Bengio, Yoshua
    NEUROCOMPUTING, 2018, 284 : 171 - 176