An efficient joint training model for monaural noisy-reverberant speech recognition

被引：0

作者：

Lian, Xiaoyu ^{[1
]}

Xia, Nan ^{[1
]}

Dai, Gaole ^{[1
]}

Yang, Hongqin ^{[1
]}

机构：

[1] Dalian Polytech Univ, Sch Informat Sci & Engn, Dalian 116034, Liaoning, Peoples R China

来源：

APPLIED ACOUSTICS | 2025年 / 228卷

关键词：

Deep learning; Speech enhancement; Speech recognition; Attention mechanism; Joint training; NETWORKS; ENHANCEMENT; FRAMEWORK; SIGNAL;

D O I：

10.1016/j.apacoust.2024.110322

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Noise and reverberation can seriously reduce speech quality and intelligibility, affecting the performance of downstream speech recognition tasks. This paper constructs a joint training speech recognition network for speech recognition in monaural noisy-reverberant environments. In the speech enhancement model, a complex-valued channel and temporal-frequency attention (CCTFA) are integrated to focus on the key features of the complex spectrum. Then the CCTFA network (CCTFANet) is constructed to reduce the influence of noise and reverberation. In the speech recognition model, an element-wise linear attention (EWLA) module is proposed to linearize the attention complexity and reduce the number of parameters and computations required for the attention module. Then the EWLA Conformer (EWLAC) is constructed as an efficient end-to-end speech recognition model. On the open source dataset, joint training of CCTFANet with EWLAC reduces the CER by 3.27%. Compared to other speech recognition models, EWLAC maintains CER while achieving much lower parameter count, computational overhead, and higher inference speed.

引用

页数：13

共 50 条

[31] Robust speech recognition in reverberant environments by using an optimal synthetic room impulse response model
Liu, Jindong
Yang, Guang-Zhong
SPEECH COMMUNICATION, 2015, 67 : 65 - 77
[32] NOISE MODEL TRANSFER USING AFFINE TRANSFORMATION WITH APPLICATION TO LARGE VOCABULARY REVERBERANT SPEECH RECOGNITION
Yoshioka, Takuya
Nakatani, Tomohiro
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2013, : 7058 - 7062
[33] Recognition of Noisy Speech: A Comparative Survey of Robust Model Architecture and Feature Enhancement
Björn Schuller
Martin Wöllmer
Tobias Moosmayr
Gerhard Rigoll
EURASIP Journal on Audio, Speech, and Music Processing, 2009
[34] Noisy Speech Training in MFCC-based Speech Recognition with Noise Suppression Toward Robot Assisted Autism therapy
Attawibulkul, Sujirat
Kaewkamnerdpong, Boonserm
Miyanaga, Yoshikazu
2017 10TH BIOMEDICAL ENGINEERING INTERNATIONAL CONFERENCE (BMEICON), 2017,
[35] Adaptive Training with Noisy Constrained Maximum Likelihood Linear Regression for Noise Robust Speech Recognition
Kim, D. K.
Gales, M. J. F.
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, 2009, : 2367 - 2370
[36] ADAPTIVE BEAMFORMING AND ADAPTIVE TRAINING OF DNN ACOUSTIC MODELS FOR ENHANCED MULTICHANNEL NOISY SPEECH RECOGNITION
Prudnikov, Alexey
Korenevsky, Maxim
Aleinik, Sergei
2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), 2015, : 401 - 408
[37] Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition
Gao, Tian
Du, Jun
Xu, Yong
Liu, Cong
Dai, Li-Rong
Lee, Chin-Hui
EURASIP JOURNAL ON ADVANCES IN SIGNAL PROCESSING, 2016,
[38] Joint training of DNNs by incorporating an explicit dereverberation structure for distant speech recognition
Tian Gao
Jun Du
Yong Xu
Cong Liu
Li-Rong Dai
Chin-Hui Lee
EURASIP Journal on Advances in Signal Processing, 2016
[39] Acoustic model training for speech recognition over mobile networks
Vojtko, Juraj
Kacur, Juraj
Rozinaj, Gregor
Korosi, Jan
INTERNATIONAL JOURNAL OF SIGNAL AND IMAGING SYSTEMS ENGINEERING, 2013, 6 (02) : 65 - 74
[40] Hybrid model of hidden Markov models and wavelet neural network in noisy speech recognition
Lin Sui-fang
Pan Yong-xiang
Sun Xu-xia
Proceedings of 2005 Chinese Control and Decision Conference, Vols 1 and 2, 2005, : 675 - 678

← 1 2 3 4 5 →