Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory

被引:17
|
作者
Wu, Chunyang [1 ]
Wang, Yongqiang [1 ]
Shi, Yangyang [1 ]
Yeh, Ching-Feng [1 ]
Zhang, Frank [1 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
来源
关键词
streaming speech recognition; transformer; acoustic modeling;
D O I
10.21437/Interspeech.2020-2079
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
Transformer-based acoustic modeling has achieved great success for both hybrid and sequence-to-sequence speech recognition. However, it requires access to the full sequence, and the computational cost grows quadratically with respect to the input sequence length. These factors limit its adoption for streaming applications. In this work, we proposed a novel augmented memory self-attention, which attends on a short segment of the input sequence and a bank of memories. The memory bank stores the embedding information for all the processed segments. On the librispeech benchmark, our proposed method outperforms all the existing streamable transformer methods by a large margin and achieved over 15% relative error reduction, compared with the widely used LC-BLSTM baseline. Our findings are also confirmed on some large internal datasets.
引用
收藏
页码:2132 / 2136
页数:5
相关论文
共 50 条
  • [21] Relative molecule self-attention transformer
    Łukasz Maziarka
    Dawid Majchrowski
    Tomasz Danel
    Piotr Gaiński
    Jacek Tabor
    Igor Podolak
    Paweł Morkisz
    Stanisław Jastrzębski
    Journal of Cheminformatics, 16
  • [22] Relative molecule self-attention transformer
    Maziarka, Lukasz
    Majchrowski, Dawid
    Danel, Tomasz
    Gainski, Piotr
    Tabor, Jacek
    Podolak, Igor
    Morkisz, Pawel
    Jastrzebski, Stanislaw
    JOURNAL OF CHEMINFORMATICS, 2024, 16 (01)
  • [23] CSP-Former: A Transformer-Based Network for Point Cloud Analysis with Compressed Sensing and Spatial Self-Attention
    Zhong, Jiandan
    Jiang, Hongyu
    Ji, Yulin
    Li, Yingxiang
    Xue, Yajuan
    ELECTRONICS, 2025, 14 (02):
  • [24] STREAMING ATTENTION-BASED MODELS WITH AUGMENTED MEMORY FOR END-TO-END SPEECH RECOGNITION
    Yeh, Ching-Feng
    Wang, Yongqiang
    Shi, Yangyang
    Wu, Chunyang
    Zhang, Frank
    Chan, Julian
    Seltzer, Michael L.
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 8 - 14
  • [25] Spectral Superresolution Using Transformer with Convolutional Spectral Self-Attention
    Liao, Xiaomei
    He, Lirong
    Mao, Jiayou
    Xu, Meng
    REMOTE SENSING, 2024, 16 (10)
  • [26] Self-Attention Memory-Augmented Wavelet-CNN for Anomaly Detection
    Wu, Kun
    Zhu, Lei
    Shi, Weihang
    Wang, Wenwu
    Wu, Jin
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (03) : 1374 - 1385
  • [27] Infectious disease time series modelling using transformer self-attention based network
    Prakash, Satya
    Jalal, Anand Singh
    Pathak, Pooja
    ENGINEERING RESEARCH EXPRESS, 2025, 7 (01):
  • [28] Universal Graph Transformer Self-Attention Networks
    Dai Quoc Nguyen
    Tu Dinh Nguyen
    Dinh Phung
    COMPANION PROCEEDINGS OF THE WEB CONFERENCE 2022, WWW 2022 COMPANION, 2022, : 193 - 196
  • [29] Sparse self-attention transformer for image inpainting
    Huang, Wenli
    Deng, Ye
    Hui, Siqi
    Wu, Yang
    Zhou, Sanping
    Wang, Jinjun
    PATTERN RECOGNITION, 2024, 145
  • [30] SST: self-attention transformer for infrared deconvolution
    Gao, Lei
    Yan, Xiaohong
    Deng, Lizhen
    Xu, Guoxia
    Zhu, Hu
    INFRARED PHYSICS & TECHNOLOGY, 2024, 140