Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

被引：4

作者：

Wei, Kun ^{[1
]}

Guo, Pengcheng ^{[1
]}

Jiang, Ning ^{[2
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China

[2] Mashang Consumer Finance Co Ltd, Beijing, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

关键词：

End-to-end speech recognition; Transformer; Long context; Conversational ASR;

D O I：

10.21437/Interspeech.2022-10066

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework. The main idea of Transformers is to capture the long-range global context within an utterance by self-attention layers. However, for scenarios like conversational speech, such utterance-level modeling will neglect contextual dependencies that span across utterances. In this paper, we propose to explicitly model the inter-sentential information in a Transformer based end-to-end architecture for conversational speech recognition. Specifically, for the encoder network, we capture the contexts of previous speech and incorporate such historic information into current input by a context-aware residual attention mechanism. For the decoder, the prediction of current utterance is also conditioned on the historic linguistic information through a conditional decoder framework. We show the effectiveness of our proposed method on several open-source dialogue corpora and the proposed method consistently improved the performance from the utterance-level Transformer-based ASR models.

引用

页码：3804 / 3808

页数：5

共 50 条

[41] Transformer-based multi-level attention integration network for video saliency prediction
Rui Tan
Minghui Sun
Yanhua Liang
Multimedia Tools and Applications, 2025, 84 (13) : 11833 - 11854
[42] TSMCF: Transformer-Based SAR and Multispectral Cross-Attention Fusion for Cloud Removal
Zhu, Hongming
Wang, Zeju
Han, Letong
Xu, Manxin
Li, Weiqi
Liu, Qin
Liu, Sicong
Du, Bowen
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2025, 18 : 6710 - 6720
[43] Transformer-based Cross attention and Feature Diversity for Occluded Person Re-identification
Kang S.
Kim S.
Seo K.
Transactions of the Korean Institute of Electrical Engineers, 2023, 72 (01) : 108 - 113
[44] Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection
Xuqiang Zhuang
Fangai Liu
Jian Hou
Jianhua Hao
Xiaohong Cai
Neural Processing Letters, 2022, 54 : 1943 - 1960
[45] Streaming Transformer-based Acoustic Models Using Self-attention with Augmented Memory
Wu, Chunyang
Wang, Yongqiang
Shi, Yangyang
Yeh, Ching-Feng
Zhang, Frank
INTERSPEECH 2020, 2020, : 2132 - 2136
[46] Transformer-Based Interactive Multi-Modal Attention Network for Video Sentiment Detection
Zhuang, Xuqiang
Liu, Fangai
Hou, Jian
Hao, Jianhua
Cai, Xiaohong
NEURAL PROCESSING LETTERS, 2022, 54 (03) : 1943 - 1960
[47] Self-and-Mixed Attention Decoder with Deep Acoustic Structure for Transformer-based LVCSR
Zhou, Xinyuan
Lee, Grandee
Yilmaz, Emre
Long, Yanhua
Liang, Jiaen
Li, Haizhou
INTERSPEECH 2020, 2020, : 5016 - 5020
[48] Regularizing Transformer-based Acoustic Models by Penalizing Attention Weights for Robust Speech Recognition
Lee, Mun-Hak
Lee, Sang-Eon
Seong, Ju-Seok
Chang, Joon-Hyuk
Kwon, Haeyoung
Park, Chanhee
INTERSPEECH 2022, 2022, : 56 - 60
[49] RLFAT: A Transformer-Based Relay Link Forged Attack Detection Mechanism in SDN
Zhang, Tianyi
Wang, Yong
ELECTRONICS, 2023, 12 (10)
[50] Skin Lesion Segmentation Improved by Transformer-Based Networks with Inter-scale Dependency Modeling
Eskandari, Sania
Lumpp, Janet
Giraldo, Luis Sanchez
MACHINE LEARNING IN MEDICAL IMAGING, MLMI 2023, PT I, 2024, 14348 : 351 - 360

← 1 2 3 4 5 →