Roles and Utilization of Attention Heads in Transformer-based Neural Language Models

被引：0

作者：

Jo, Jae-young ^{[1
,2
]}

Myaeng, Sung-hyon ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Sch Comp, Daejeon, South Korea

[2] Dingbro AI Res, Daejeon, South Korea

来源：

58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020) | 2020年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Sentence encoders based on the transformer architecture have shown promising results on various natural language tasks. The main impetus lies in the pre-trained neural language models that capture long-range dependencies among words, owing to multi-head attention that is unique in the architecture. However, little is known for how linguistic properties are processed, represented, and utilized for downstream tasks among hundreds of attention heads inside the pre-trained transformer-based model. For the initial goal of examining the roles of attention heads in handling a set of linguistic features, we conducted a set of experiments with ten probing tasks and three downstream tasks on four pre-trained transformer families (GPT, GPT2, BERT, and ELECTRA). Meaningful insights are shown through the lens of heat map visualization and utilized to propose a relatively simple sentence representation method that takes advantage of most influential attention heads, resulting in additional performance improvements on the downstream tasks.

引用

页码：3404 / 3417

页数：14

共 50 条

[21] The Generalization and Robustness of Transformer-Based Language Models on Commonsense Reasoning
Shen, Ke
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23419 - 23420
[22] Reward modeling for mitigating toxicity in transformer-based language models
Farshid Faal
Ketra Schmitt
Jia Yuan Yu
Applied Intelligence, 2023, 53 : 8421 - 8435
[23] Reward modeling for mitigating toxicity in transformer-based language models
Faal, Farshid
Schmitt, Ketra
Yu, Jia Yuan
APPLIED INTELLIGENCE, 2023, 53 (07) : 8421 - 8435
[24] Self-Distillation into Self-Attention Heads for Improving Transformer-based End-to-End Neural Speaker Diarization
Jeoung, Ye-Rin
Choi, Jeong-Hwan
Seong, Ju-Seok
Kyung, JeHyun
Chang, Joon-Hyuk
INTERSPEECH 2023, 2023, : 3197 - 3201
[25] Attention heads of large language models
Zheng, Zifan
Wang, Yezhaohui
Huang, Yuxin
Song, Shichao
Yang, Mingchuan
Tang, Bo
Xiong, Feiyu
Li, Zhiyu
PATTERNS, 2025, 6 (02):
[26] Tweets Topic Classification and Sentiment Analysis Based on Transformer-Based Language Models
Mandal, Ranju
Chen, Jinyan
Becken, Susanne
Stantic, Bela
VIETNAM JOURNAL OF COMPUTER SCIENCE, 2023, 10 (02) : 117 - 134
[27] Transformer-based Language Models for Semantic Search and Mobile Applications Retrieval
Coelho, Joao
Neto, Antonio
Tavares, Miguel
Coutinho, Carlos
Oliveira, Joao
Ribeiro, Ricardo
Batista, Fernando
PROCEEDINGS OF THE 13TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (KDIR), VOL 1:, 2021, : 225 - 232
[28] Dynamic Low-rank Estimation for Transformer-based Language Models
Huai, Ting
Lie, Xiao
Gao, Shangqian
Hsu, Yenchang
Shen, Yilin
Jin, Hongxia
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 9275 - 9287
[29] Pre-training and Evaluating Transformer-based Language Models for Icelandic
Daoason, Jon Friorik
Loftsson, Hrafn
LREC 2022: THIRTEEN INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2022, : 7386 - 7391
[30] Shared functional specialization in transformer-based language models and the human brain
Kumar, Sreejan
Sumers, Theodore R.
Yamakoshi, Takateru
Goldstein, Ariel
Hasson, Uri
Norman, Kenneth A.
Griffiths, Thomas L.
Hawkins, Robert D.
Nastase, Samuel A.
NATURE COMMUNICATIONS, 2024, 15 (01)

← 1 2 3 4 5 →