Enhancing Attention Models via Multi-head Collaboration

被引:0
作者
Wang, Huadong [1 ]
Tu, Mei [1 ]
机构
[1] Samsung Res China Beijing SRC B, Beijing, Peoples R China
来源
2020 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP 2020) | 2020年
关键词
attention model; multi-head attention; multi-head collaboration;
D O I
10.1109/ialp51396.2020.9310460
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Neural attention based models have recently boosted performance on various NLP tasks. Compared with single-head attention, multi-head attention is more powerful and popular. Multi-head attention independently attend to information from different feature subspaces and generates multiple attention distributions. In this paper, we make an assumption that the current multi-head attention method can generate complementary attention distributions, but these distributions may not collaborate properly to improve prediction quality. To validate our assumption, we propose a simple but effective method to enhance the collaboration of different attention heads, which allows different heads to have the chance to rectify their attention scores with other heads. Empirical study shows that our proposed method can significantly improve the performance of multi-head attention over a range of NLP tasks, and the experimental results also prove the existence of the problem of multi-head collaboration.
引用
收藏
页码:19 / 23
页数:5
相关论文
共 26 条
[1]  
[Anonymous], 2018, P INT C LEARN REPR I
[2]  
[Anonymous], 2019, Visualizing attention in transformerbased language representation models
[3]  
Bowman SR, 2016, PROCEEDINGS OF THE 54TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, VOL 1, P1466
[4]  
Cui Hongyi, 2019, P 3 WORKSHOP NE, P206
[5]  
Dai Z., 2019, ASS COMPUTATIONAL LI
[6]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7]  
Gehring J, 2017, PR MACH LEARN RES, V70
[8]  
Geng X., 2020, DOES SELECTIVE MECH
[9]  
Iida S, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019:): STUDENT RESEARCH WORKSHOP, P217
[10]  
Jain S, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P3543