Joint Speech and Noise Estimation Using SNR-Adaptive Target Learning for Deep-Learning-Based Speech Enhancement

被引:0
作者
Li, Xiaoran [1 ]
Guo, Zilu [1 ]
Du, Jun [1 ]
Lee, Chin-Hui [2 ]
Gao, Yu
Zhang, Wenbin [3 ]
机构
[1] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[2] Georgia Inst Technol, Atlanta, GA USA
[3] Midea Grp Co Ltd, AI Innovat Ctr, Shanghai, Peoples R China
来源
MAN-MACHINE SPEECH COMMUNICATION, NCMMSC 2023 | 2024年 / 2006卷
关键词
speech recognition; speech enhancement; adaptive noise reduction; NEURAL-NETWORK;
D O I
10.1007/978-981-97-0601-3_8
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper, we propose an SNR-adaptive target learning strategy and apply it to a joint speech and noise estimation network to address the mismatch between speech enhancement (SE) and automatic speech recognition (ASR) modules. The progressive learning (PL) methods have revealed the importance of retaining residual noise in the training targets of the enhancement model to alleviate this mismatch. Inspired by this, we adopt an SNR-adaptive target learning strategy to optimize the SNR targets for the SE model, thereby achieving adaptive denoising of the enhancement model in a data-driven manner and further improving its performance on the backend ASR task. Next, we extend the SNR-adaptive target learning strategy to a joint speech and noise estimation network and validate the adaptability of the target learning strategy with the noise prediction branch. We demonstrate the effectiveness of our proposed method on a public benchmark, achieving a significant relative word error rate (WER) reduction of approximately 37% compared to the WER results obtained from unprocessed noisy speech.
引用
收藏
页码:92 / 101
页数:10
相关论文
共 23 条
[1]   Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging [J].
Cohen, I .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :466-475
[2]  
Gao T, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5054, DOI 10.1109/ICASSP.2018.8461861
[3]   SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement [J].
Gao, Tian ;
Du, Jun ;
Dai, Li-Rong ;
Lee, Chin-Hui .
17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, :3713-3717
[4]  
Iwamoto K., 2022, INTERSPEECH
[5]   Combining speech enhancement and auditory feature extraction for robust speech recognition [J].
Kleinschmidt, M ;
Tchorz, J ;
Kollmeier, B .
SPEECH COMMUNICATION, 2001, 34 (1-2) :75-91
[6]  
KNESER R, 1995, INT CONF ACOUST SPEE, P181, DOI 10.1109/ICASSP.1995.479394
[7]  
Koizumi Y., 2022, P INTERSPEECH
[8]  
Liu M, 2018, INT CONF SIGN PROCES, P245, DOI 10.1109/ICSP.2018.8652331
[9]   Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation [J].
Luo, Yi ;
Mesgarani, Nima .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2019, 27 (08) :1256-1266
[10]   Supervised Monaural Speech Enhancement Using Complementary Joint Sparse Representations [J].
Luo, You ;
Bao, Guangzhao ;
Xu, Yangfei ;
Ye, Zhongfu .
IEEE SIGNAL PROCESSING LETTERS, 2016, 23 (02) :237-241