Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model

被引：0

作者：

Long, Yinghan ^{[1
]}

Chowdhury, Sayeed Shafayet ^{[1
]}

Roy, Kaushik ^{[1
]}

机构：

[1] Purdue Univ, W Lafayette, IN 47907 USA

来源：

FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023) | 2023年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Transformers have shown dominant performance across a range of domains including language and vision. However, their computational cost grows quadratically with the sequence length, making their usage prohibitive for resource-constrained applications. To counter this, our approach is to divide the whole sequence into segments and apply attention to the individual segments. We propose a segmented recurrent transformer (SRformer) that combines segmented (local) attention with recurrent attention. The loss caused by reducing the attention window length is compensated by aggregating information across segments with recurrent attention. SRformer leverages Recurrent Accumulate-and-Fire (RAF) neurons' inherent memory to update the cumulative product of keys and values. The segmented attention and lightweight RAF neurons ensure the efficiency of the proposed transformer. Such an approach leads to models with sequential processing capability at a lower computation/memory cost. We apply the proposed method to T5 and BART transformers. The modified models are tested on summarization datasets including CNN-dailymail, XSUM, ArXiv, and MediaSUM. Notably, using segmented inputs of varied sizes, the proposed model achieves 6- 22% higher ROUGE1 scores than a segmented transformer and outperforms other recurrent transformer approaches. Furthermore, compared to full attention, the proposed model reduces the computational complexity of cross attention by around 40%.

引用

页码：8325 / 8337

页数：13

共 50 条

[1] CORRECTION OF AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMER SEQUENCE-TO-SEQUENCE MODEL
Hrinchuk, Oleksii
Popova, Mariya
Ginsburg, Boris
2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7074 - 7078
[2] Sequence-to-sequence translation from mass spectra to peptides with a transformer model
Yilmaz, Melih
Fondrie, William E.
Bittremieux, Wout
Melendez, Carlo F.
Nelson, Rowan
Ananth, Varun
Oh, Sewoong
Noble, William Stafford
NATURE COMMUNICATIONS, 2024, 15 (01)
[3] In-Image Neural Machine Translation with Segmented Pixel Sequence-to-Sequence Model
Tian, Yanzhi
Li, Xiang
Liu, Zeming
Guo, Yuhang
Wang, Bin
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15046 - 15057
[4] SPEECH-TRANSFORMER: A NO-RECURRENCE SEQUENCE-TO-SEQUENCE MODEL FOR SPEECH RECOGNITION
Dong, Linhao
Xu, Shuang
Xu, Bo
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5884 - 5888
[5] BART-IT: An Efficient Sequence-to-Sequence Model for Italian Text Summarization
La Quatra, Moreno
Cagliero, Luca
FUTURE INTERNET, 2023, 15 (01)
[6] A Sequence-to-Sequence Framework Based on Transformer With Masked Language Model for Optical Music Recognition
Wen, Cuihong
Zhu, Longjiao
IEEE ACCESS, 2022, 10 : 118243 - 118252
[7] A Sequence-to-Sequence Transformer Model for Satellite Retrieval of Aerosol Optical and Microphysical Parameters from Space
Zhang, Luo
Gu, Haoran
Li, Zhengqiang
Liu, Zhenhai
Zhang, Ying
Xie, Yisong
Zhang, Zihan
Ji, Zhe
Li, Zhiyu
Yan, Chaoyu
REMOTE SENSING, 2024, 16 (24)
[8] LABT: A Sequence-to-Sequence Model for Mongolian Handwritten Text Recognition with Local Aggregation BiLSTM and Transformer
Li, Yu
Wei, Hongxi
Sun, Shiwen
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 352 - 363
[9] A Sequence-to-Sequence Model for Semantic Role Labeling
Daza, Angel
Frank, Anette
REPRESENTATION LEARNING FOR NLP, 2018, : 207 - 216
[10] Document Ranking with a Pretrained Sequence-to-Sequence Model
Nogueira, Rodrigo
Jiang, Zhiying
Pradeep, Ronak
Lin, Jimmy
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 708 - 718

← 1 2 3 4 5 →