Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

被引:4
作者
Wei, Kun [1 ]
Guo, Pengcheng [1 ]
Jiang, Ning [2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China
[2] Mashang Consumer Finance Co Ltd, Beijing, Peoples R China
来源
INTERSPEECH 2022 | 2022年
关键词
End-to-end speech recognition; Transformer; Long context; Conversational ASR;
D O I
10.21437/Interspeech.2022-10066
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework. The main idea of Transformers is to capture the long-range global context within an utterance by self-attention layers. However, for scenarios like conversational speech, such utterance-level modeling will neglect contextual dependencies that span across utterances. In this paper, we propose to explicitly model the inter-sentential information in a Transformer based end-to-end architecture for conversational speech recognition. Specifically, for the encoder network, we capture the contexts of previous speech and incorporate such historic information into current input by a context-aware residual attention mechanism. For the decoder, the prediction of current utterance is also conditioned on the historic linguistic information through a conditional decoder framework. We show the effectiveness of our proposed method on several open-source dialogue corpora and the proposed method consistently improved the performance from the utterance-level Transformer-based ASR models.
引用
收藏
页码:3804 / 3808
页数:5
相关论文
共 50 条
  • [1] TRANSFORMER-BASED STREAMING ASR WITH CUMULATIVE ATTENTION
    Li, Mohan
    Zhang, Shucong
    Zorila, Catalin
    Doddipatla, Rama
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8272 - 8276
  • [2] Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR
    Maekaku, Takashi
    Fujita, Yuya
    Peng, Yifan
    Watanabe, Shinji
    INTERSPEECH 2022, 2022, : 1071 - 1075
  • [3] Improving Conversational Recommender Systems via Transformer-based Sequential Modelling
    Zou, Jie
    Kanoulas, Evangelos
    Ren, Pengjie
    Ren, Zhaochun
    Sun, Aixin
    Long, Cheng
    PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2319 - 2324
  • [4] Transformer-based multivariate time series anomaly detection using inter-variable attention mechanism
    Kang, Hyeongwon
    Kang, Pilsung
    KNOWLEDGE-BASED SYSTEMS, 2024, 290
  • [5] Improving scene text image captioning using transformer-based multilevel attention
    Srivastava, Swati
    Sharma, Himanshu
    JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (03)
  • [6] HEAD-SYNCHRONOUS DECODING FOR TRANSFORMER-BASED STREAMING ASR
    Li, Mohan
    Zorila, Catalin
    Doddipatla, Rama
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5909 - 5913
  • [7] A novel transformer-based network with attention mechanism for automatic pavement crack detection
    Guo, Feng
    Liu, Jian
    Lv, Chengshun
    Yu, Huayang
    CONSTRUCTION AND BUILDING MATERIALS, 2023, 391
  • [8] Attention Calibration for Transformer-based Sequential Recommendation
    Zhou, Peilin
    Ye, Qichen
    Xie, Yueqi
    Gao, Jingqi
    Wang, Shoujin
    Kim, Jae Boum
    You, Chenyu
    Kim, Sunghun
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 3595 - 3605
  • [9] Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models
    Zhang, Shucong
    Loweimi, Erfan
    Bell, Peter
    Renals, Steve
    INTERSPEECH 2021, 2021, : 2541 - 2545
  • [10] Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies
    Li, Zehan
    Miao, Haoran
    Deng, Keqi
    Cheng, Gaofeng
    Tian, Sanli
    Li, Ta
    Yan, Yonghong
    INTERSPEECH 2022, 2022, : 1671 - 1675