Roles and Utilization of Attention Heads in Transformer-based Neural Language Models

被引：0

作者：

Jo, Jae-young ^{[1
,2
]}

Myaeng, Sung-hyon ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Sch Comp, Daejeon, South Korea

[2] Dingbro AI Res, Daejeon, South Korea

来源：

58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020) | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sentence encoders based on the transformer architecture have shown promising results on various natural language tasks. The main impetus lies in the pre-trained neural language models that capture long-range dependencies among words, owing to multi-head attention that is unique in the architecture. However, little is known for how linguistic properties are processed, represented, and utilized for downstream tasks among hundreds of attention heads inside the pre-trained transformer-based model. For the initial goal of examining the roles of attention heads in handling a set of linguistic features, we conducted a set of experiments with ten probing tasks and three downstream tasks on four pre-trained transformer families (GPT, GPT2, BERT, and ELECTRA). Meaningful insights are shown through the lens of heat map visualization and utilized to propose a relatively simple sentence representation method that takes advantage of most influential attention heads, resulting in additional performance improvements on the downstream tasks.

引用

页码：3404 / 3417

页数：14

共 50 条

[1] Is Transformer-Based Attention Agnostic of the Pretraining Language and Task?
Martin, R. H. J.
Visser, R.
Dunaiski, M.
SOUTH AFRICAN COMPUTER SCIENCE AND INFORMATION SYSTEMS RESEARCH TRENDS, SAICSIT 2024, 2024, 2159 : 95 - 123
[2] A Study on Performance Enhancement by Integrating Neural Topic Attention with Transformer-Based Language Model
Um, Taehum
Kim, Namhyoung
APPLIED SCIENCES-BASEL, 2024, 14 (17):
[3] The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models
Wennberg, Ulme
Henter, Gustav Eje
ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 130 - 140
[4] Ouroboros: On Accelerating Training of Transformer-Based Language Models
Yang, Qian
Huo, Zhouyuan
Wang, Wenlin
Huang, Heng
Carin, Lawrence
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[5] Transformer-Based Language Models for Software Vulnerability Detection
Thapa, Chandra
Jang, Seung Ick
Ahmed, Muhammad Ejaz
Camtepe, Seyit
Pieprzyk, Josef
Nepal, Surya
PROCEEDINGS OF THE 38TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2022, 2022, : 481 - 496
[6] BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks
Oh, Jong-Hoon
Iida, Ryu
Kloetzer, Julien
Torisawa, Kentaro
59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2103 - 2115
[7] A Comparison of Transformer-Based Language Models on NLP Benchmarks
Greco, Candida Maria
Tagarelli, Andrea
Zumpano, Ester
NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 490 - 501
[8] RadBERT: Adapting Transformer-based Language Models to Radiology
Yan, An
McAuley, Julian
Lu, Xing
Du, Jiang
Chang, Eric Y.
Gentili, Amilcare
Hsu, Chun-Nan
RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2022, 4 (04)
[9] Applications of transformer-based language models in bioinformatics: a survey
Zhang, Shuang
Fan, Rui
Liu, Yuti
Chen, Shuang
Liu, Qiao
Zeng, Wanwen
NEURO-ONCOLOGY ADVANCES, 2023, 5 (01)
[10] TAG: Gradient Attack on Transformer-based Language Models
Deng, Jieren
Wang, Yijue
Li, Ji
Wang, Chenghong
Shang, Chao
Liu, Hang
Rajasekaran, Sanguthevar
Ding, Caiwen
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3600 - 3610

← 1 2 3 4 5 →