Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

被引：4

作者：

Wei, Kun ^{[1
]}

Guo, Pengcheng ^{[1
]}

Jiang, Ning ^{[2
]}

机构：

[1] Northwestern Polytech Univ, Sch Comp Sci, Audio Speech & Language Proc Grp ASLP NPU, Xian, Peoples R China

[2] Mashang Consumer Finance Co Ltd, Beijing, Peoples R China

来源：

INTERSPEECH 2022 | 2022年

关键词：

End-to-end speech recognition; Transformer; Long context; Conversational ASR;

D O I：

10.21437/Interspeech.2022-10066

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework. The main idea of Transformers is to capture the long-range global context within an utterance by self-attention layers. However, for scenarios like conversational speech, such utterance-level modeling will neglect contextual dependencies that span across utterances. In this paper, we propose to explicitly model the inter-sentential information in a Transformer based end-to-end architecture for conversational speech recognition. Specifically, for the encoder network, we capture the contexts of previous speech and incorporate such historic information into current input by a context-aware residual attention mechanism. For the decoder, the prediction of current utterance is also conditioned on the historic linguistic information through a conditional decoder framework. We show the effectiveness of our proposed method on several open-source dialogue corpora and the proposed method consistently improved the performance from the utterance-level Transformer-based ASR models.

引用

页码：3804 / 3808

页数：5

共 50 条

[1] TRANSFORMER-BASED STREAMING ASR WITH CUMULATIVE ATTENTION
Li, Mohan
Zhang, Shucong
Zorila, Catalin
Doddipatla, Rama
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 8272 - 8276
[2] Attention Weight Smoothing Using Prior Distributions for Transformer-Based End-to-End ASR
Maekaku, Takashi
Fujita, Yuya
Peng, Yifan
Watanabe, Shinji
INTERSPEECH 2022, 2022, : 1071 - 1075
[3] Improving Conversational Recommender Systems via Transformer-based Sequential Modelling
Zou, Jie
Kanoulas, Evangelos
Ren, Pengjie
Ren, Zhaochun
Sun, Aixin
Long, Cheng
PROCEEDINGS OF THE 45TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '22), 2022, : 2319 - 2324
[4] Transformer-based multivariate time series anomaly detection using inter-variable attention mechanism
Kang, Hyeongwon
Kang, Pilsung
KNOWLEDGE-BASED SYSTEMS, 2024, 290
[5] Improving scene text image captioning using transformer-based multilevel attention
Srivastava, Swati
Sharma, Himanshu
JOURNAL OF ELECTRONIC IMAGING, 2023, 32 (03)
[6] HEAD-SYNCHRONOUS DECODING FOR TRANSFORMER-BASED STREAMING ASR
Li, Mohan
Zorila, Catalin
Doddipatla, Rama
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021), 2021, : 5909 - 5913
[7] A novel transformer-based network with attention mechanism for automatic pavement crack detection
Guo, Feng
Liu, Jian
Lv, Chengshun
Yu, Huayang
CONSTRUCTION AND BUILDING MATERIALS, 2023, 391
[8] Attention Calibration for Transformer-based Sequential Recommendation
Zhou, Peilin
Ye, Qichen
Xie, Yueqi
Gao, Jingqi
Wang, Shoujin
Kim, Jae Boum
You, Chenyu
Kim, Sunghun
PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 3595 - 3605
[9] Stochastic Attention Head Removal: A Simple and Effective Method for Improving Transformer Based ASR Models
Zhang, Shucong
Loweimi, Erfan
Bell, Peter
Renals, Steve
INTERSPEECH 2021, 2021, : 2541 - 2545
[10] Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies
Li, Zehan
Miao, Haoran
Deng, Keqi
Cheng, Gaofeng
Tian, Sanli
Li, Ta
Yan, Yonghong
INTERSPEECH 2022, 2022, : 1671 - 1675

← 1 2 3 4 5 →