The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models

被引:0
|
作者
Wennberg, Ulme [1 ]
Henter, Gustav Eje [1 ]
机构
[1] KTH Royal Inst Technol, Div Speech Mus & Hearing, Stockholm, Sweden
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Mechanisms for encoding positional information are central for transformer-based language models. In this paper, we analyze the position embeddings of existing language models, finding strong evidence of translation invariance, both for the embeddings themselves and for their effect on self-attention. The degree of translation invariance increases during training and correlates positively with model performance. Our findings lead us to propose translation-invariant self-attention (TISA), which accounts for the relative position between tokens in an interpretable fashion without needing conventional position embeddings. Our proposal has several theoretical advantages over existing position-representation approaches. Experiments show that it improves on regular ALBERT on GLUE tasks, while only adding orders of magnitude less positional parameters.
引用
收藏
页码:130 / 140
页数:11
相关论文
共 50 条
  • [1] Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
    Raganato, Alessandro
    Scherrer, Yves
    Tiedemann, Jorg
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 556 - 568
  • [2] Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory
    Wu, Chunyang
    Wang, Yongqiang
    Shi, Yangyang
    Yeh, Ching-Feng
    Zhang, Frank
    INTERSPEECH 2020, 2020, : 2132 - 2136
  • [3] Local-Global Self-Attention for Transformer-Based Object Tracking
    Chen, Langkun
    Gao, Long
    Jiang, Yan
    Li, Yunsong
    He, Gang
    Ning, Jifeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12316 - 12329
  • [4] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
    Luo, Haoneng
    Zhang, Shiliang
    Lei, Ming
    Xie, Lei
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81
  • [5] Re-Transformer: A Self-Attention Based Model for Machine Translation
    Liu, Huey-Ing
    Chen, Wei-Lin
    AI IN COMPUTATIONAL LINGUISTICS, 2021, 189 : 3 - 10
  • [6] Synthesizer: Rethinking Self-Attention for Transformer Models
    Tay, Yi
    Bahri, Dara
    Metzler, Donald
    Juan, Da-Cheng
    Zhao, Zhe
    Zheng, Che
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7192 - 7203
  • [7] Roles and Utilization of Attention Heads in Transformer-based Neural Language Models
    Jo, Jae-young
    Myaeng, Sung-hyon
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), 2020, : 3404 - 3417
  • [8] Transformer-Based Models for Predicting Molecular Structures from Infrared Spectra Using Patch-Based Self-Attention
    Wu, Wenjin
    Leonardis, Aless
    Jiao, Jianbo
    Jiang, Jun
    Chen, Linjiang
    JOURNAL OF PHYSICAL CHEMISTRY A, 2025, 129 (08): : 2077 - 2085
  • [9] A transformer-based approach empowered by a self-attention technique for semantic segmentation in remote sensing
    Boulila, Wadii
    Ghandorh, Hamza
    Masood, Sharjeel
    Alzahem, Ayyub
    Koubaa, Anis
    Ahmed, Fawad
    Khan, Zahid
    Ahmad, Jawad
    HELIYON, 2024, 10 (08)
  • [10] Transformer-Based Dual-Channel Self-Attention for UUV Autonomous Collision Avoidance
    Lin, Changjian
    Cheng, Yuhu
    Wang, Xuesong
    Yuan, Jianya
    Wang, Guoqing
    IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2023, 8 (03): : 2319 - 2331