Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

被引:4
作者
Wei, Kun [1 ]
Guo, Pengcheng [1 ]
Jiang, Ning [2 ]
机构
[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China
[2] Mashang Consumer Finance Co Ltd, Beijing, Peoples R China
来源
INTERSPEECH 2022 | 2022年
关键词
End-to-end speech recognition; Transformer; Long context; Conversational ASR;
D O I
10.21437/Interspeech.2022-10066
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework. The main idea of Transformers is to capture the long-range global context within an utterance by self-attention layers. However, for scenarios like conversational speech, such utterance-level modeling will neglect contextual dependencies that span across utterances. In this paper, we propose to explicitly model the inter-sentential information in a Transformer based end-to-end architecture for conversational speech recognition. Specifically, for the encoder network, we capture the contexts of previous speech and incorporate such historic information into current input by a context-aware residual attention mechanism. For the decoder, the prediction of current utterance is also conditioned on the historic linguistic information through a conditional decoder framework. We show the effectiveness of our proposed method on several open-source dialogue corpora and the proposed method consistently improved the performance from the utterance-level Transformer-based ASR models.
引用
收藏
页码:3804 / 3808
页数:5
相关论文
共 50 条
  • [41] Transformer-based multi-level attention integration network for video saliency prediction
    Rui Tan
    Minghui Sun
    Yanhua Liang
    Multimedia Tools and Applications, 2025, 84 (13) : 11833 - 11854
  • [42] TSMCF: Transformer-Based SAR and Multispectral Cross-Attention Fusion for Cloud Removal
    Zhu, Hongming
    Wang, Zeju
    Han, Letong
    Xu, Manxin
    Li, Weiqi
    Liu, Qin
    Liu, Sicong
    Du, Bowen
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 6710 - 6720
  • [43] Transformer-based Cross attention and Feature Diversity for Occluded Person Re-identification
    Kang S.
    Kim S.
    Seo K.
    Transactions of the Korean Institute of Electrical Engineers, 2023, 72 (01) : 108 - 113
  • [44] Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection
    Xuqiang Zhuang
    Fangai Liu
    Jian Hou
    Jianhua Hao
    Xiaohong Cai
    Neural Processing Letters, 2022, 54 : 1943 - 1960
  • [45] Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory
    Wu, Chunyang
    Wang, Yongqiang
    Shi, Yangyang
    Yeh, Ching-Feng
    Zhang, Frank
    INTERSPEECH 2020, 2020, : 2132 - 2136
  • [46] Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection
    Zhuang, Xuqiang
    Liu, Fangai
    Hou, Jian
    Hao, Jianhua
    Cai, Xiaohong
    NEURAL PROCESSING LETTERS, 2022, 54 (03) : 1943 - 1960
  • [47] Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-based LVCSR
    Zhou, Xinyuan
    Lee, Grandee
    Yilmaz, Emre
    Long, Yanhua
    Liang, Jiaen
    Li, Haizhou
    INTERSPEECH 2020, 2020, : 5016 - 5020
  • [48] Regularizing Transformer-based Acoustic Models by Penalizing Attention Weights for Robust Speech Recognition
    Lee, Mun-Hak
    Lee, Sang-Eon
    Seong, Ju-Seok
    Chang, Joon-Hyuk
    Kwon, Haeyoung
    Park, Chanhee
    INTERSPEECH 2022, 2022, : 56 - 60
  • [49] RLFAT: A Transformer-Based Relay Link Forged Attack Detection Mechanism in SDN
    Zhang, Tianyi
    Wang, Yong
    ELECTRONICS, 2023, 12 (10)
  • [50] Skin Lesion Segmentation Improved by Transformer-Based Networks with Inter-scale Dependency Modeling
    Eskandari, Sania
    Lumpp, Janet
    Giraldo, Luis Sanchez
    MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I, 2024, 14348 : 351 - 360