Segmented Recurrent Transformer: An Efficient Sequence-to-Sequence Model

被引:0
|
作者
Long, Yinghan [1 ]
Chowdhury, Sayeed Shafayet [1 ]
Roy, Kaushik [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
来源
FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023) | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Transformers have shown dominant performance across a range of domains including language and vision. However, their computational cost grows quadratically with the sequence length, making their usage prohibitive for resource-constrained applications. To counter this, our approach is to divide the whole sequence into segments and apply attention to the individual segments. We propose a segmented recurrent transformer (SRformer) that combines segmented (local) attention with recurrent attention. The loss caused by reducing the attention window length is compensated by aggregating information across segments with recurrent attention. SRformer leverages Recurrent Accumulate-and-Fire (RAF) neurons' inherent memory to update the cumulative product of keys and values. The segmented attention and lightweight RAF neurons ensure the efficiency of the proposed transformer. Such an approach leads to models with sequential processing capability at a lower computation/memory cost. We apply the proposed method to T5 and BART transformers. The modified models are tested on summarization datasets including CNN-dailymail, XSUM, ArXiv, and MediaSUM. Notably, using segmented inputs of varied sizes, the proposed model achieves 6- 22% higher ROUGE1 scores than a segmented transformer and outperforms other recurrent transformer approaches. Furthermore, compared to full attention, the proposed model reduces the computational complexity of cross attention by around 40%.
引用
收藏
页码:8325 / 8337
页数:13
相关论文
共 50 条
  • [1] CORRECTION OF AUTOMATIC SPEECH RECOGNITION WITH TRANSFORMER SEQUENCE-TO-SEQUENCE MODEL
    Hrinchuk, Oleksii
    Popova, Mariya
    Ginsburg, Boris
    2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2020, : 7074 - 7078
  • [2] Sequence-to-sequence translation from mass spectra to peptides with a transformer model
    Yilmaz, Melih
    Fondrie, William E.
    Bittremieux, Wout
    Melendez, Carlo F.
    Nelson, Rowan
    Ananth, Varun
    Oh, Sewoong
    Noble, William Stafford
    NATURE COMMUNICATIONS, 2024, 15 (01)
  • [3] In-Image Neural Machine Translation with Segmented Pixel Sequence-to-Sequence Model
    Tian, Yanzhi
    Li, Xiang
    Liu, Zeming
    Guo, Yuhang
    Wang, Bin
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (EMNLP 2023), 2023, : 15046 - 15057
  • [4] SPEECH-TRANSFORMER: A NO-RECURRENCE SEQUENCE-TO-SEQUENCE MODEL FOR SPEECH RECOGNITION
    Dong, Linhao
    Xu, Shuang
    Xu, Bo
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 5884 - 5888
  • [5] BART-IT: An Efficient Sequence-to-Sequence Model for Italian Text Summarization
    La Quatra, Moreno
    Cagliero, Luca
    FUTURE INTERNET, 2023, 15 (01)
  • [6] A Sequence-to-Sequence Framework Based on Transformer With Masked Language Model for Optical Music Recognition
    Wen, Cuihong
    Zhu, Longjiao
    IEEE ACCESS, 2022, 10 : 118243 - 118252
  • [7] A Sequence-to-Sequence Transformer Model for Satellite Retrieval of Aerosol Optical and Microphysical Parameters from Space
    Zhang, Luo
    Gu, Haoran
    Li, Zhengqiang
    Liu, Zhenhai
    Zhang, Ying
    Xie, Yisong
    Zhang, Zihan
    Ji, Zhe
    Li, Zhiyu
    Yan, Chaoyu
    REMOTE SENSING, 2024, 16 (24)
  • [8] LABT: A Sequence-to-Sequence Model for Mongolian Handwritten Text Recognition with Local Aggregation BiLSTM and Transformer
    Li, Yu
    Wei, Hongxi
    Sun, Shiwen
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 352 - 363
  • [9] A Sequence-to-Sequence Model for Semantic Role Labeling
    Daza, Angel
    Frank, Anette
    REPRESENTATION LEARNING FOR NLP, 2018, : 207 - 216
  • [10] Document Ranking with a Pretrained Sequence-to-Sequence Model
    Nogueira, Rodrigo
    Jiang, Zhiying
    Pradeep, Ronak
    Lin, Jimmy
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 708 - 718