Reinforcement of BERT with Dependency-Parsing Based Attention Mask

被引:0
作者
Mechouma, Toufik [1 ]
Biskri, Ismail [2 ]
Meunier, Jean Guy [1 ]
机构
[1] Univ Quebec Montreal, Montreal, PQ, Canada
[2] Univ Quebec Trois Rivieres, Trois Rivieres, PQ, Canada
来源
ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2022 | 2022年 / 1653卷
关键词
Bert; Transformers; Attention mechanisms; Dependency parsing;
D O I
10.1007/978-3-031-16210-7_9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Dot-Product based attention mechanism is among recent attention mechanisms. It showed an outstanding performance with BERT. In this paper, we propose a dependency-parsing mask to reinforce the padding mask, at the multi-head attention units. Padding mask, is already used to filter padding positions. The proposed mask, aims to improve BERT attention filter. The conducted experiments, show that BERT performs better with the proposed mask.
引用
收藏
页码:112 / 122
页数:11
相关论文
共 9 条
[1]   What does BERT look at? An Analysis of BERT's Attention [J].
Clark, Kevin ;
Khandelwal, Urvashi ;
Levy, Omer ;
Manning, Christopher D. .
BLACKBOXNLP WORKSHOP ON ANALYZING AND INTERPRETING NEURAL NETWORKS FOR NLP AT ACL 2019, 2019, :276-286
[2]   Bringing Transparency Design into Practice [J].
Eiband, Malin ;
Schneider, Hanna ;
Bilandzic, Mark ;
Fazekas-Con, Julian ;
Haug, Mareike ;
Hussmann, Heinrich .
IUI 2018: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES, 2018, :211-223
[3]  
Graves A., 2012, LONG SHORT TERM MEMO, DOI [10.1007/978-3-642-24797-2-4, DOI 10.1007/978-3-642-24797-2-4]
[4]  
Honnibal M., 2017, spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing, DOI DOI 10.3233/978-1-60750-588-4-1080
[5]  
Luong M.-T., 2015, LNCS, DOI DOI 10.18653/V1/D15-1166
[6]  
Open Sourcing BERT, State-of-the-Art Pre-training for Natural Language Processing-Google AI Blog
[7]  
Radford Alec, 2018, Improving language understanding by generative pre-training
[8]  
Sak H, 2014, INTERSPEECH, P338
[9]  
Vaswani A, 2017, ADV NEUR IN, V30