Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

被引:4
作者
Wei, Kun [1 ]
Guo, Pengcheng [1 ]
Jiang, Ning [2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China
[2] Mashang Consumer Finance Co Ltd, Beijing, Peoples R China
来源
INTERSPEECH 2022 | 2022年
关键词
End-to-end speech recognition; Transformer; Long context; Conversational ASR;
D O I
10.21437/Interspeech.2022-10066
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework. The main idea of Transformers is to capture the long-range global context within an utterance by self-attention layers. However, for scenarios like conversational speech, such utterance-level modeling will neglect contextual dependencies that span across utterances. In this paper, we propose to explicitly model the inter-sentential information in a Transformer based end-to-end architecture for conversational speech recognition. Specifically, for the encoder network, we capture the contexts of previous speech and incorporate such historic information into current input by a context-aware residual attention mechanism. For the decoder, the prediction of current utterance is also conditioned on the historic linguistic information through a conditional decoder framework. We show the effectiveness of our proposed method on several open-source dialogue corpora and the proposed method consistently improved the performance from the utterance-level Transformer-based ASR models.
引用
收藏
页码:3804 / 3808
页数:5
相关论文
共 50 条
  • [31] Local-Global Self-Attention for Transformer-Based Object Tracking
    Chen, Langkun
    Gao, Long
    Jiang, Yan
    Li, Yunsong
    He, Gang
    Ning, Jifeng
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12316 - 12329
  • [32] SiRi: A Simple Selective Retraining Mechanism for Transformer-Based Visual Grounding
    Qu, Mengxue
    Wu, Yu
    Liu, Wu
    Gong, Qiqi
    Liang, Xiaodan
    Russakovsky, Olga
    Zhao, Yao
    Wei, Yunchao
    COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 546 - 562
  • [33] DDosTC: A Transformer-Based Network Attack Detection Hybrid Mechanism in SDN
    Wang, Haomin
    Li, Wei
    SENSORS, 2021, 21 (15)
  • [34] Improving transformer-based acoustic model performance using sequence discriminative training
    Lee, Chae-Won
    Chang, Joon-Hyuk
    JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2022, 41 (03): : 335 - 341
  • [35] Attention Fusion of Transformer-Based and Scale-Based Method for Hyperspectral and LiDAR Joint Classification
    Zhang, Maqun
    Gao, Feng
    Zhang, Tiange
    Gan, Yanhai
    Dong, Junyu
    Yu, Hui
    REMOTE SENSING, 2023, 15 (03)
  • [36] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
    Xu, Menglong
    Li, Shengqiang
    Zhang, Xiao-Lei
    2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
  • [37] Local Attention Transformer-Based Full-View Finger-Vein Identification
    Qin, Huafeng
    Hu, Rongshan
    El-Yacoubi, Mounim A.
    Li, Yantao
    Gao, Xinbo
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (06) : 2767 - 2782
  • [38] ETMA: Efficient Transformer-Based Multilevel Attention Framework for Multimodal Fake News Detection
    Yadav, Ashima
    Gaba, Shivani
    Khan, Haneef
    Budhiraja, Ishan
    Singh, Akansha
    Singh, Krishna Kant
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (04) : 5015 - 5027
  • [39] Understanding the PULSAR effect in combined radiotherapy and immunotherapy using transformer-based attention mechanisms
    Peng, Hao
    Moore, Casey
    Saha, Debabrata
    Jiang, Steve
    Timmerman, Robert
    FRONTIERS IN ONCOLOGY, 2024, 14
  • [40] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
    Luo, Haoneng
    Zhang, Shiliang
    Lei, Ming
    Xie, Lei
    2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81