Speech Enhancement Algorithm Combining Cochlear Features and Deep Neural Network with Skip Connections

被引：0

作者：

Chaofeng Lan

Yuqiao Wang

Lei Zhang

Zelong Yu

Chundong Liu

Xiaoxia Guo

机构：

[1] Harbin University of Science and Technology,School of Measurement and Communication Engineering

[2] Beidahuang Industry Group General Hospital,undefined

来源：

Journal of Signal Processing Systems | 2023年 / 95卷

关键词：

Speech enhancement; DNN; Skip connections; MRCG; Low SNR;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

To solve the problem of the poor enhancement effect of traditional deep learning-based speech enhancement algorithms in low signal-to-noise ratio (SNR) scenarios, this paper proposes a method combining front-end processing Multi-Resolution Cochleagram(FP-MRCG) and skip connections deep neural network (Skip-DNN). This method uses FP-MRCG speech features to train Skip-DNN, and estimates the ideal ratio mask, filters out the background noise of the noisy speech to obtain the enhanced speech features, and obtains enhanced speech by phase reconstruction. The result shows that when the SNR is 0dB, using FP-MRCG as Skip-DNN’s input, the average perceptual evaluation of speech quality (PESQ) of enhanced speech is 2.5283, and the average short-term objective intelligibility (STOI) is 0.8825, which is 3%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} and 1.7%\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\%$$\end{document} higher than MRCG, respectively. Besides, when using FP-MRCG as the input of DNN, Skip-DNN and convolutional neural network (CNN), Skip-DNN has a higher evaluation score in a low SNR environment, and CNN has a higher evaluation score in a high SNR environment. However, the training time for the CNN is twice as long as that for the Skip-DNN. Hence, it can be concluded that Skip-DNN performs better in speech enhancement than the other two networks.

引用

页码：979 / 989

页数：10

共 32 条

[1]

Boll SF(1979)Suppression of acoustic noise in speech using spectral subtraction IEEE Transactions on Signal Processing 27 113-120

[2]

Lim JS(2005)Enhancement and bandwidth compression of noisy speech Proceedings of the IEEE 67 1586-1604

[3]

Oppenheim AV(1984)Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator IEEE Transactions on Signal Processing 32 1109-1121

[4]

Ephraim Y(2016)Neural network-based adaptive noise cancellation for enhancement of speech auditory brainstem responses SIViP 10 389-395

[5]

Malah D(2020)Deep learning based target cancellation for speech dereverberation IEEE Transactions on Signal Processing 28 941-950

[6]

Gholami-Boroujeny S(2017)WANG D L; Time-frequency masking in the complex domain for speech dereverberation and denoising IEEE Transactions on Signal Processing 25 1492-1501

[7]

Fallatah A(2017)Impact of phase estimation on single-channel speech separation based on time-frequency masking JASA 141 4668-4679

[8]

Heffernan BP(2020)Joint constraint algorithm based on deep neural network with dual outputs for single-channel speech separation SIViP 14 1387-1395

[9]

Wang ZQ(2019)Two-stage deep learning for noisy-reverberant speech enhancement IEEE Transactions on Signal Processing 27 53-62

[10]

Wang DL(2016)Complex ratio masking for monaural speech separation IEEE Transactions on Signal Processing 24 483-492

← 1 2 3 4 →