Roles and Utilization of Attention Heads in Transformer-based Neural Language Models

被引:0
|
作者
Jo, Jae-young [1 ,2 ]
Myaeng, Sung-hyon [1 ]
机构
[1] Korea Adv Inst Sci & Technol, Sch Comp, Daejeon, South Korea
[2] Dingbro AI Res, Daejeon, South Korea
来源
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020) | 2020年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Sentence encoders based on the transformer architecture have shown promising results on various natural language tasks. The main impetus lies in the pre-trained neural language models that capture long-range dependencies among words, owing to multi-head attention that is unique in the architecture. However, little is known for how linguistic properties are processed, represented, and utilized for downstream tasks among hundreds of attention heads inside the pre-trained transformer-based model. For the initial goal of examining the roles of attention heads in handling a set of linguistic features, we conducted a set of experiments with ten probing tasks and three downstream tasks on four pre-trained transformer families (GPT, GPT2, BERT, and ELECTRA). Meaningful insights are shown through the lens of heat map visualization and utilized to propose a relatively simple sentence representation method that takes advantage of most influential attention heads, resulting in additional performance improvements on the downstream tasks.
引用
收藏
页码:3404 / 3417
页数:14
相关论文
共 50 条
  • [1] Is Transformer-Based Attention Agnostic of the Pretraining Language and Task?
    Martin, R. H. J.
    Visser, R.
    Dunaiski, M.
    SOUTH AFRICAN COMPUTER SCIENCE AND INFORMATION SYSTEMS RESEARCH TRENDS, SAICSIT 2024, 2024, 2159 : 95 - 123
  • [2] A Study on Performance Enhancement by Integrating Neural Topic Attention with Transformer-Based Language Model
    Um, Taehum
    Kim, Namhyoung
    APPLIED SCIENCES-BASEL, 2024, 14 (17):
  • [3] The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models
    Wennberg, Ulme
    Henter, Gustav Eje
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 130 - 140
  • [4] Ouroboros: On Accelerating Training of Transformer-Based Language Models
    Yang, Qian
    Huo, Zhouyuan
    Wang, Wenlin
    Huang, Heng
    Carin, Lawrence
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [5] Transformer-Based Language Models for Software Vulnerability Detection
    Thapa, Chandra
    Jang, Seung Ick
    Ahmed, Muhammad Ejaz
    Camtepe, Seyit
    Pieprzyk, Josef
    Nepal, Surya
    PROCEEDINGS OF THE 38TH ANNUAL COMPUTER SECURITY APPLICATIONS CONFERENCE, ACSAC 2022, 2022, : 481 - 496
  • [6] BERTAC: Enhancing Transformer-based Language Models with Adversarially Pretrained Convolutional Neural Networks
    Oh, Jong-Hoon
    Iida, Ryu
    Kloetzer, Julien
    Torisawa, Kentaro
    59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 1 (ACL-IJCNLP 2021), 2021, : 2103 - 2115
  • [7] A Comparison of Transformer-Based Language Models on NLP Benchmarks
    Greco, Candida Maria
    Tagarelli, Andrea
    Zumpano, Ester
    NATURAL LANGUAGE PROCESSING AND INFORMATION SYSTEMS (NLDB 2022), 2022, 13286 : 490 - 501
  • [8] RadBERT: Adapting Transformer-based Language Models to Radiology
    Yan, An
    McAuley, Julian
    Lu, Xing
    Du, Jiang
    Chang, Eric Y.
    Gentili, Amilcare
    Hsu, Chun-Nan
    RADIOLOGY-ARTIFICIAL INTELLIGENCE, 2022, 4 (04)
  • [9] Applications of transformer-based language models in bioinformatics: a survey
    Zhang, Shuang
    Fan, Rui
    Liu, Yuti
    Chen, Shuang
    Liu, Qiao
    Zeng, Wanwen
    NEURO-ONCOLOGY ADVANCES, 2023, 5 (01)
  • [10] TAG: Gradient Attack on Transformer-based Language Models
    Deng, Jieren
    Wang, Yijue
    Li, Ji
    Wang, Chenghong
    Shang, Chao
    Liu, Hang
    Rajasekaran, Sanguthevar
    Ding, Caiwen
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2021, 2021, : 3600 - 3610