Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

被引：4

作者：

Wei, Kun ^{[1
]}

Guo, Pengcheng ^{[1
]}

Jiang, Ning ^{[2
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China

[2] Mashang Consumer Finance Co Ltd, Beijing, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

关键词：

End-to-end speech recognition; Transformer; Long context; Conversational ASR;

D O I：

10.21437/Interspeech.2022-10066

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework. The main idea of Transformers is to capture the long-range global context within an utterance by self-attention layers. However, for scenarios like conversational speech, such utterance-level modeling will neglect contextual dependencies that span across utterances. In this paper, we propose to explicitly model the inter-sentential information in a Transformer based end-to-end architecture for conversational speech recognition. Specifically, for the encoder network, we capture the contexts of previous speech and incorporate such historic information into current input by a context-aware residual attention mechanism. For the decoder, the prediction of current utterance is also conditioned on the historic linguistic information through a conditional decoder framework. We show the effectiveness of our proposed method on several open-source dialogue corpora and the proposed method consistently improved the performance from the utterance-level Transformer-based ASR models.

引用

页码：3804 / 3808

页数：5

共 50 条

[21] A transformer-based approach for improving app review response generation
Zhang, Weizhe
Gu, Wenchao
Gao, Cuiyun
Lyu, Michael R.
SOFTWARE-PRACTICE & EXPERIENCE, 2023, 53 (02) : 438 - 454
[22] Improving soil surface evaporation estimates with transformer-based model
Zou, Mijun
Zhong, Lei
Jia, Weijia
Ge, Yangfei
Mamtimin, Ali
ATMOSPHERIC RESEARCH, 2025, 316
[23] Explaining transformer-based next activity prediction by using attention scores
Martin Käppel
Lars Ackermann
Stefan Jablonski
Simon Härtl
Process Science, 2 (1):
[24] TRANSFORMER-BASED LIP-READING WITH REGULARIZED DROPOUT AND RELAXED ATTENTION
Li, Zhengyang
Lohrenz, Timo
Dunkelberg, Matthias
Fingscheidt, Tim
2022 IEEE SPOKEN LANGUAGE TECHNOLOGY WORKSHOP, SLT, 2022, : 723 - 730
[25] Exploring Pathogen Presence Prediction in Pastured Poultry Farms through Transformer-Based Models and Attention Mechanism Explainability
Das, Athish Ram
Pillai, Nisha
Nanduri, Bindu
Rothrock Jr, Michael J.
Ramkumar, Mahalingam
MICROORGANISMS, 2024, 12 (07)
[26] Self-Distillation for Improving CTC-Transformer-based ASR Systems
Moriya, Takafumi
Ochiai, Tsubasa
Karita, Shigeki
Sato, Hiroshi
Tanaka, Tomohiro
Ashihara, Takanori
Masumura, Ryo
Shinohara, Yusuke
Delcroix, Marc
INTERSPEECH 2020, 2020, : 546 - 550
[27] MINIMUM WORD ERROR TRAINING FOR NON-AUTOREGRESSIVE TRANSFORMER-BASED CODE-SWITCHING ASR
Peng, Yizhou
Zhang, Jicheng
Xu, Haihua
Huang, Hao
Chng, Eng Siong
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 7807 - 7811
[28] Efficient Open Domain Question Answering With Delayed Attention in Transformer-Based Models
Siblini, Wissam
Challal, Mohamed
Pasqual, Charlotte
INTERNATIONAL JOURNAL OF DATA WAREHOUSING AND MINING, 2022, 18 (02)
[29] Transformer-based monocular depth estimation with hybrid attention fusion and progressive regression
Liu, Peng
Zhang, Zonghua
Meng, Zhaozong
Gao, Nan
NEUROCOMPUTING, 2025, 620
[30] Transformer-based multi-attention hybrid networks for skin lesion segmentation
Dong, Zhiwei
Li, Jinjiang
Hua, Zhen
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 244

← 1 2 3 4 5 →