Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

被引：4

作者：

Wei, Kun ^{[1
]}

Guo, Pengcheng ^{[1
]}

Jiang, Ning ^{[2
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China

[2] Mashang Consumer Finance Co Ltd, Beijing, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

关键词：

End-to-end speech recognition; Transformer; Long context; Conversational ASR;

D O I：

10.21437/Interspeech.2022-10066

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework. The main idea of Transformers is to capture the long-range global context within an utterance by self-attention layers. However, for scenarios like conversational speech, such utterance-level modeling will neglect contextual dependencies that span across utterances. In this paper, we propose to explicitly model the inter-sentential information in a Transformer based end-to-end architecture for conversational speech recognition. Specifically, for the encoder network, we capture the contexts of previous speech and incorporate such historic information into current input by a context-aware residual attention mechanism. For the decoder, the prediction of current utterance is also conditioned on the historic linguistic information through a conditional decoder framework. We show the effectiveness of our proposed method on several open-source dialogue corpora and the proposed method consistently improved the performance from the utterance-level Transformer-based ASR models.

引用

页码：3804 / 3808

页数：5

共 50 条

[31] Local-Global Self-Attention for Transformer-Based Object Tracking
Chen, Langkun
Gao, Long
Jiang, Yan
Li, Yunsong
He, Gang
Ning, Jifeng
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (12) : 12316 - 12329
[32] SiRi: A Simple Selective Retraining Mechanism for Transformer-Based Visual Grounding
Qu, Mengxue
Wu, Yu
Liu, Wu
Gong, Qiqi
Liang, Xiaodan
Russakovsky, Olga
Zhao, Yao
Wei, Yunchao
COMPUTER VISION - ECCV 2022, PT XXXV, 2022, 13695 : 546 - 562
[33] DDosTC: A Transformer-Based Network Attack Detection Hybrid Mechanism in SDN
Wang, Haomin
Li, Wei
SENSORS, 2021, 21 (15)
[34] Improving transformer-based acoustic model performance using sequence discriminative training
Lee, Chae-Won
Chang, Joon-Hyuk
JOURNAL OF THE ACOUSTICAL SOCIETY OF KOREA, 2022, 41 (03): : 335 - 341
[35] Attention Fusion of Transformer-Based and Scale-Based Method for Hyperspectral and LiDAR Joint Classification
Zhang, Maqun
Gao, Feng
Zhang, Tiange
Gan, Yanhai
Dong, Junyu
Yu, Hui
REMOTE SENSING, 2023, 15 (03)
[36] TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION WITH LOCAL DENSE SYNTHESIZER ATTENTION
Xu, Menglong
Li, Shengqiang
Zhang, Xiao-Lei
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5899 - 5903
[37] Local Attention Transformer-Based Full-View Finger-Vein Identification
Qin, Huafeng
Hu, Rongshan
El-Yacoubi, Mounim A.
Li, Yantao
Gao, Xinbo
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2023, 33 (06) : 2767 - 2782
[38] ETMA: Efficient Transformer-Based Multilevel Attention Framework for Multimodal Fake News Detection
Yadav, Ashima
Gaba, Shivani
Khan, Haneef
Budhiraja, Ishan
Singh, Akansha
Singh, Krishna Kant
IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (04) : 5015 - 5027
[39] Understanding the PULSAR effect in combined radiotherapy and immunotherapy using transformer-based attention mechanisms
Peng, Hao
Moore, Casey
Saha, Debabrata
Jiang, Steve
Timmerman, Robert
FRONTIERS IN ONCOLOGY, 2024, 14
[40] SIMPLIFIED SELF-ATTENTION FOR TRANSFORMER-BASED END-TO-END SPEECH RECOGNITION
Luo, Haoneng
Zhang, Shiliang
Lei, Ming
Xie, Lei
2021 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP (SLT), 2021, : 75 - 81

← 1 2 3 4 5 →