The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models

被引：0

作者：

Wennberg, Ulme ^{[1
]}

Henter, Gustav Eje ^{[1
]}

机构：

[1] KTH Royal Inst Technol, Div Speech Mus & Hearing, Stockholm, Sweden

来源：

ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2 | 2021年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Mechanisms for encoding positional information are central for transformer-based language models. In this paper, we analyze the position embeddings of existing language models, finding strong evidence of translation invariance, both for the embeddings themselves and for their effect on self-attention. The degree of translation invariance increases during training and correlates positively with model performance. Our findings lead us to propose translation-invariant self-attention (TISA), which accounts for the relative position between tokens in an interpretable fashion without needing conventional position embeddings. Our proposal has several theoretical advantages over existing position-representation approaches. Experiments show that it improves on regular ALBERT on GLUE tasks, while only adding orders of magnitude less positional parameters.

引用

页码：130 / 140

页数：11

共 50 条

[31] Slow dynamics in translation-invariant quantum lattice models
Michailidis, Alexios A.
Znidaric, Marko
Medvedyeva, Mariya
Abanin, Dmitry A.
Prosen, Tomaz
Papic, Z.
PHYSICAL REVIEW B, 2018, 97 (10)
[32] AMMU: A survey of transformer-based biomedical pretrained language models
Kalyan, Katikapalli Subramanyam
Rajasekharan, Ajit
Sangeetha, Sivanesan
JOURNAL OF BIOMEDICAL INFORMATICS, 2022, 126
[33] Transformer-based language models for mental health issues: A survey
Greco, Candida M.
Simeri, Andrea
Tagarelli, Andrea
Zumpano, Ester
PATTERN RECOGNITION LETTERS, 2023, 167 : 204 - 211
[34] Pre-trained transformer-based language models for Sundanese
Wilson Wongso
Henry Lucky
Derwin Suhartono
Journal of Big Data, 9
[35] Transformer-Based Composite Language Models for Text Evaluation and Classification
Skoric, Mihailo
Utvic, Milos
Stankovic, Ranka
MATHEMATICS, 2023, 11 (22)
[36] Automatic text summarization using transformer-based language models
Rao, Ritika
Sharma, Sourabh
Malik, Nitin
INTERNATIONAL JOURNAL OF SYSTEM ASSURANCE ENGINEERING AND MANAGEMENT, 2024, 15 (06) : 2599 - 2605
[37] CardioBERTpt: Transformer-based Models for Cardiology Language Representation in Portuguese
Rubel Schneider, Elisa Terumi
Gumiel, Yohan Bonescki
Andrioli de Souza, Joao Vitor
Mukai, Lilian Mie
Silva e Oliveira, Lucas Emanuel
Rebelo, Marina de Sa
Gutierrez, Marco Antonio
Krieger, Jose Eduardo
Teodoro, Douglas
Moro, Claudia
Paraiso, Emerson Cabrera
2023 IEEE 36TH INTERNATIONAL SYMPOSIUM ON COMPUTER-BASED MEDICAL SYSTEMS, CBMS, 2023, : 378 - 381
[38] Assessing the Syntactic Capabilities of Transformer-based Multilingual Language Models
Perez-Mayos, Laura
Taboas Garcia, Alba
Mille, Simon
Wanner, Leo
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL-IJCNLP 2021, 2021, : 3799 - 3812
[39] The Generalization and Robustness of Transformer-Based Language Models on Commonsense Reasoning
Shen, Ke
THIRTY-EIGTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 21, 2024, : 23419 - 23420
[40] Pre-trained transformer-based language models for Sundanese
Wongso, Wilson
Lucky, Henry
Suhartono, Derwin
JOURNAL OF BIG DATA, 2022, 9 (01)

← 1 2 3 4 5 →