Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

被引:4
作者
Wei, Kun [1 ]
Guo, Pengcheng [1 ]
Jiang, Ning [2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China
[2] Mashang Consumer Finance Co Ltd, Beijing, Peoples R China
来源
INTERSPEECH 2022 | 2022年
关键词
End-to-end speech recognition; Transformer; Long context; Conversational ASR;
D O I
10.21437/Interspeech.2022-10066
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework. The main idea of Transformers is to capture the long-range global context within an utterance by self-attention layers. However, for scenarios like conversational speech, such utterance-level modeling will neglect contextual dependencies that span across utterances. In this paper, we propose to explicitly model the inter-sentential information in a Transformer based end-to-end architecture for conversational speech recognition. Specifically, for the encoder network, we capture the contexts of previous speech and incorporate such historic information into current input by a context-aware residual attention mechanism. For the decoder, the prediction of current utterance is also conditioned on the historic linguistic information through a conditional decoder framework. We show the effectiveness of our proposed method on several open-source dialogue corpora and the proposed method consistently improved the performance from the utterance-level Transformer-based ASR models.
引用
收藏
页码:3804 / 3808
页数:5
相关论文
共 50 条
  • [21] A transformer-based approach for improving app review response generation
    Zhang, Weizhe
    Gu, Wenchao
    Gao, Cuiyun
    Lyu, Michael R.
    SOFTWARE-PRACTICE & EXPERIENCE, 2023, 53 (02) : 438 - 454
  • [22] Improving soil surface evaporation estimates with transformer-based model
    Zou, Mijun
    Zhong, Lei
    Jia, Weijia
    Ge, Yangfei
    Mamtimin, Ali
    ATMOSPHERIC RESEARCH, 2025, 316
  • [23] Explaining transformer-based next activity prediction by using attention scores
    Martin Käppel
    Lars Ackermann
    Stefan Jablonski
    Simon Härtl
    Process Science, 2 (1):
  • [24] TRANSFORMER-BASED LIP-READING WITH REGULARIZED DROPOUT AND RELAXED ATTENTION
    Li, Zhengyang
    Lohrenz, Timo
    Dunkelberg, Matthias
    Fingscheidt, Tim
    2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 723 - 730
  • [25] Exploring Pathogen Presence Prediction in Pastured Poultry Farms through Transformer-Based Models and Attention Mechanism Explainability
    Das, Athish Ram
    Pillai, Nisha
    Nanduri, Bindu
    Rothrock Jr, Michael J.
    Ramkumar, Mahalingam
    MICROORGANISMS, 2024, 12 (07)
  • [26] Self-Distillation for Improving CTC-Transformer-based ASR Systems
    Moriya, Takafumi
    Ochiai, Tsubasa
    Karita, Shigeki
    Sato, Hiroshi
    Tanaka, Tomohiro
    Ashihara, Takanori
    Masumura, Ryo
    Shinohara, Yusuke
    Delcroix, Marc
    INTERSPEECH 2020, 2020, : 546 - 550
  • [27] MINIMUM WORD ERROR TRAINING FOR NON-AUTOREGRESSIVE TRANSFORMER-BASED CODE-SWITCHING ASR
    Peng, Yizhou
    Zhang, Jicheng
    Xu, Haihua
    Huang, Hao
    Chng, Eng Siong
    2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7807 - 7811
  • [28] Efficient Open Domain Question Answering With Delayed Attention in Transformer-Based Models
    Siblini, Wissam
    Challal, Mohamed
    Pasqual, Charlotte
    INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2022, 18 (02)
  • [29] Transformer-based monocular depth estimation with hybrid attention fusion and progressive regression
    Liu, Peng
    Zhang, Zonghua
    Meng, Zhaozong
    Gao, Nan
    NEUROCOMPUTING, 2025, 620
  • [30] Transformer-based multi-attention hybrid networks for skin lesion segmentation
    Dong, Zhiwei
    Li, Jinjiang
    Hua, Zhen
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244