The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models

被引:0
|
作者
Wennberg, Ulme [1 ]
Henter, Gustav Eje [1 ]
机构
[1] KTH Royal Inst Technol, Div Speech Mus & Hearing, Stockholm, Sweden
来源
ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2 | 2021年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mechanisms for encoding positional information are central for transformer-based language models. In this paper, we analyze the position embeddings of existing language models, finding strong evidence of translation invariance, both for the embeddings themselves and for their effect on self-attention. The degree of translation invariance increases during training and correlates positively with model performance. Our findings lead us to propose translation-invariant self-attention (TISA), which accounts for the relative position between tokens in an interpretable fashion without needing conventional position embeddings. Our proposal has several theoretical advantages over existing position-representation approaches. Experiments show that it improves on regular ALBERT on GLUE tasks, while only adding orders of magnitude less positional parameters.
引用
收藏
页码:130 / 140
页数:11
相关论文
共 50 条
  • [31] Slow dynamics in translation-invariant quantum lattice models
    Michailidis, Alexios A.
    Znidaric, Marko
    Medvedyeva, Mariya
    Abanin, Dmitry A.
    Prosen, Tomaz
    Papic, Z.
    PHYSICAL REVIEW B, 2018, 97 (10)
  • [32] AMMU: A survey of transformer-based biomedical pretrained language models
    Kalyan, Katikapalli Subramanyam
    Rajasekharan, Ajit
    Sangeetha, Sivanesan
    JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 126
  • [33] Transformer-based language models for mental health issues: A survey
    Greco, Candida M.
    Simeri, Andrea
    Tagarelli, Andrea
    Zumpano, Ester
    PATTERN RECOGNITION LETTERS, 2023, 167 : 204 - 211
  • [34] Pre-trained transformer-based language models for Sundanese
    Wilson Wongso
    Henry Lucky
    Derwin Suhartono
    Journal of Big Data, 9
  • [35] Transformer-Based Composite Language Models for Text Evaluation and Classification
    Skoric, Mihailo
    Utvic, Milos
    Stankovic, Ranka
    MATHEMATICS, 2023, 11 (22)
  • [36] Automatic text summarization using transformer-based language models
    Rao, Ritika
    Sharma, Sourabh
    Malik, Nitin
    INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (06) : 2599 - 2605
  • [37] CardioBERTpt: Transformer-based Models for Cardiology Language Representation in Portuguese
    Rubel Schneider, Elisa Terumi
    Gumiel, Yohan Bonescki
    Andrioli de Souza, Joao Vitor
    Mukai, Lilian Mie
    Silva e Oliveira, Lucas Emanuel
    Rebelo, Marina de Sa
    Gutierrez, Marco Antonio
    Krieger, Jose Eduardo
    Teodoro, Douglas
    Moro, Claudia
    Paraiso, Emerson Cabrera
    2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS, 2023, : 378 - 381
  • [38] Assessing the Syntactic Capabilities of Transformer-based Multilingual Language Models
    Perez-Mayos, Laura
    Taboas Garcia, Alba
    Mille, Simon
    Wanner, Leo
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3799 - 3812
  • [39] The Generalization and Robustness of Transformer-Based Language Models on Commonsense Reasoning
    Shen, Ke
    THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23419 - 23420
  • [40] Pre-trained transformer-based language models for Sundanese
    Wongso, Wilson
    Lucky, Henry
    Suhartono, Derwin
    JOURNAL OF BIG DATA, 2022, 9 (01)