Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory

被引:17
|
作者
Wu, Chunyang [1 ]
Wang, Yongqiang [1 ]
Shi, Yangyang [1 ]
Yeh, Ching-Feng [1 ]
Zhang, Frank [1 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
来源
关键词
streaming speech recognition; transformer; acoustic modeling;
D O I
10.21437/Interspeech.2020-2079
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Transformer-based acoustic modeling has achieved great success for both hybrid and sequence-to-sequence speech recognition. However, it requires access to the full sequence, and the computational cost grows quadratically with respect to the input sequence length. These factors limit its adoption for streaming applications. In this work, we proposed a novel augmented memory self-attention, which attends on a short segment of the input sequence and a bank of memories. The memory bank stores the embedding information for all the processed segments. On the librispeech benchmark, our proposed method outperforms all the existing streamable transformer methods by a large margin and achieved over 15% relative error reduction, compared with the widely used LC-BLSTM baseline. Our findings are also confirmed on some large internal datasets.
引用
收藏
页码:2132 / 2136
页数:5
相关论文
共 50 条
  • [1] The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models
    Wennberg, Ulme
    Henter, Gustav Eje
    ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, 2021, : 130 - 140
  • [2] Transformer-Based Models for Predicting Molecular Structures from Infrared Spectra Using Patch-Based Self-Attention
    Wu, Wenjin
    Leonardis, Aless
    Jiao, Jianbo
    Jiang, Jun
    Chen, Linjiang
    JOURNAL OF PHYSICAL CHEMISTRY A, 2025, 129 (08): : 2077 - 2085
  • [3] Fixed Encoder Self-Attention Patterns in Transformer-Based Machine Translation
    Raganato, Alessandro
    Scherrer, Yves
    Tiedemann, Jorg
    FINDINGS OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EMNLP 2020, 2020, : 556 - 568
  • [4] Local-Global Self-Attention for Transformer-Based Object Tracking
    Chen, Langkun
    Gao, Long
    Jiang, Yan
    Li, Yunsong
    He, Gang
    Ning, Jifeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12316 - 12329
  • [5] TRANSFORMER-BASED STREAMING ASR WITH CUMULATIVE ATTENTION
    Li, Mohan
    Zhang, Shucong
    Zorila, Catalin
    Doddipatla, Rama
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8272 - 8276
  • [6] TripleFormer: improving transformer-based image classification method using multiple self-attention inputs
    Gong, Yu
    Wu, Peng
    Xu, Renjie
    Zhang, Xiaoming
    Wang, Tao
    Li, Xuan
    VISUAL COMPUTER, 2024, 40 (12): : 9039 - 9050
  • [7] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
    Luo, Haoneng
    Zhang, Shiliang
    Lei, Ming
    Xie, Lei
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81
  • [8] Synthesizer: Rethinking Self-Attention for Transformer Models
    Tay, Yi
    Bahri, Dara
    Metzler, Donald
    Juan, Da-Cheng
    Zhao, Zhe
    Zheng, Che
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139 : 7192 - 7203
  • [9] Transformer-based Acoustic Modeling for Streaming Speech Synthesis
    Wu, Chunyang
    Xiu, Zhiping
    Shi, Yangyang
    Kalinli, Ozlem
    Fuegen, Christian
    Koehler, Thilo
    He, Qing
    INTERSPEECH 2021, 2021, : 146 - 150
  • [10] A transformer-based approach empowered by a self-attention technique for semantic segmentation in remote sensing
    Boulila, Wadii
    Ghandorh, Hamza
    Masood, Sharjeel
    Alzahem, Ayyub
    Koubaa, Anis
    Ahmed, Fawad
    Khan, Zahid
    Ahmad, Jawad
    HELIYON, 2024, 10 (08)