An efficient joint training model for monaural noisy-reverberant speech recognition

被引：0

作者：

Lian, Xiaoyu ^{[1
]}

Xia, Nan ^{[1
]}

Dai, Gaole ^{[1
]}

Yang, Hongqin ^{[1
]}

机构：

[1] Dalian Polytech Univ, Sch Informat Sci & Engn, Dalian 116034, Liaoning, Peoples R China

来源：

APPLIED ACOUSTICS | 2025年 / 228卷

关键词：

Deep learning; Speech enhancement; Speech recognition; Attention mechanism; Joint training; NETWORKS; ENHANCEMENT; FRAMEWORK; SIGNAL;

D O I：

10.1016/j.apacoust.2024.110322

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Noise and reverberation can seriously reduce speech quality and intelligibility, affecting the performance of downstream speech recognition tasks. This paper constructs a joint training speech recognition network for speech recognition in monaural noisy-reverberant environments. In the speech enhancement model, a complex-valued channel and temporal-frequency attention (CCTFA) are integrated to focus on the key features of the complex spectrum. Then the CCTFA network (CCTFANet) is constructed to reduce the influence of noise and reverberation. In the speech recognition model, an element-wise linear attention (EWLA) module is proposed to linearize the attention complexity and reduce the number of parameters and computations required for the attention module. Then the EWLA Conformer (EWLAC) is constructed as an efficient end-to-end speech recognition model. On the open source dataset, joint training of CCTFANet with EWLAC reduces the CER by 3.27%. Compared to other speech recognition models, EWLAC maintains CER while achieving much lower parameter count, computational overhead, and higher inference speed.

引用

页数：13

共 50 条

[1] Convolutive Prediction for Monaural Speech Dereverberation and Noisy-Reverberant Speaker Separation
Wang, Zhong-Qiu
Wichern, Gordon
Le Roux, Jonathan
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2021, 29 : 3476 - 3490
[2] Dual branch deep interactive UNet for monaural noisy-reverberant speech enhancement
Zhang, Zehua
Xu, Shiyun
Zhuang, Xuyi
Qian, Yukun
Wang, Mingjiang
APPLIED ACOUSTICS, 2023, 212
[3] Noisy-reverberant Speech Enhancement Using DenseUNet with Time-frequency Attention
Zhao, Yan
Wang, DeLiang
INTERSPEECH 2020, 2020, : 3261 - 3265
[4] Two-Stage Enhancement of Noisy and Reverberant Microphone Array Speech for Automatic Speech Recognition Systems Trained with Only Clean Speech
Wang, Quandong
Wang, Sicheng
Ge, Fengpei
Han, Chang Woo
Lee, Jaewon
Guo, Lianghao
Lee, Chin-Hui
2018 11TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP), 2018, : 21 - 25
[5] Collaborative Joint Training With Multitask Recurrent Model for Speech and Speaker Recognition
Tang, Zhiyuan
Li, Lantian
Wang, Dong
Vipperla, Ravichander
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (03) : 493 - 504
[6] ROBUST RECOGNITION OF REVERBERANT AND NOISY SPEECH USING COHERENCE-BASED PROCESSING
Menon, Anjali
Kim, Chanwoo
Stern, Richard M.
2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2019, : 6775 - 6779
[7] SPATIAL DIFFUSENESS FEATURES FOR DNN-BASED SPEECH RECOGNITION IN NOISY AND REVERBERANT ENVIRONMENTS
Schwarz, Andreas
Huemmer, Christian
Maas, Roland
Kellermann, Walter
2015 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING (ICASSP), 2015, : 4380 - 4384
[8] Noisy training for deep neural networks in speech recognition
Shi Yin
Chao Liu
Zhiyong Zhang
Yiye Lin
Dong Wang
Javier Tejedor
Thomas Fang Zheng
Yinguo Li
EURASIP Journal on Audio, Speech, and Music Processing, 2015
[9] Noisy training for deep neural networks in speech recognition
Yin, Shi
Liu, Chao
Zhang, Zhiyong
Lin, Yiye
Wang, Dong
Tejedor, Javier
Zheng, Thomas Fang
Li, Yinguo
EURASIP JOURNAL ON AUDIO SPEECH AND MUSIC PROCESSING, 2015, : 1 - 14
[10] Improved Noisy Student Training for Automatic Speech Recognition
Park, Daniel S.
Zhang, Yu
Jia, Ye
Han, Wei
Chiu, Chung-Cheng
Li, Bo
Wu, Yonghui
Le, Quoc, V
INTERSPEECH 2020, 2020, : 2817 - 2821

← 1 2 3 4 5 →