A Convolutional Gated Recurrent Network for Speech Enhancement

被引:0
作者
Yuan W.-H. [1 ]
Hu S.-D. [1 ]
Shi Y.-L. [1 ]
Li Z. [1 ]
Liang C.-Y. [1 ]
机构
[1] College of Computer Science and Technology, Shandong University of Technology, Zibo, 255000, Shandong
来源
Tien Tzu Hsueh Pao/Acta Electronica Sinica | 2020年 / 48卷 / 07期
关键词
Convolutional neural network; Deep neural network; Gated recurrent unit; Speech enhancement;
D O I
10.3969/j.issn.0372-2112.2020.07.005
中图分类号
学科分类号
摘要
In order to improve the performance of speech enhancement networks by making full use of noisy speech features, based on the correlation of noisy speech in time and frequency, by combining the local feature extraction ability of convolutional neural networks and the long-term dependence modeling ability of gated recurrent unit, a convolutional gated recurrent network suitable for speech enhancement is designed in this paper.This network uses a convolutional network structure instead of a fully connected network structure to improve the feature calculation process in the gated recurrent unit, thereby can better retain the time-frequency structure in the noisy speech features.The experimental results show that compared with other speech enhancement networks, the proposed network has obvious advantages in speech component retention and noise component suppression, and the enhanced speech has better speech quality and intelligibility. © 2020, Chinese Institute of Electronics. All right reserved.
引用
收藏
页码:1276 / 1283
页数:7
相关论文
共 26 条
[1]  
CHEN Nan, BAO Chang-chun, Speech enhancement method based on binaural cues coding principle, Acta Electronica Sinica, 47, 1, pp. 227-233, (2019)
[2]  
OU Shifeng, SONG Peng, GAO Ying, Laplacian speech model and soft decision based MMSE estimator for noise power spectral density in speech enhancement[J], Chinese Journal of Electronics, 27, 6, pp. 1214-1220, (2018)
[3]  
LIU Wenju, NIE Shuai, LIANG Shan, Et al., Deep learning based speech separation technology and its developments, Acta Automatica Sinica, 42, 6, pp. 819-833, (2016)
[4]  
WANG D L, CHEN J., Supervised speech separation based on deep learning:An overview[J], IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26, 10, pp. 1702-1726, (2018)
[5]  
WANG Y, WANG D L., Towards scaling up classification-based speech separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 21, 7, pp. 1381-1390, (2013)
[6]  
WANG Y, NARAYANAN A, WANG D L., On training targets for supervised speech separation[J], IEEE/ACM Transactions on Audio, Speech, and Language Processing, 22, 12, pp. 1849-1858, (2014)
[7]  
WILLIAMSON D S, WANG D L., Time-frequency masking in the complex domain for speech dereverberation and denoising, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25, 7, pp. 1492-1501, (2017)
[8]  
XU Y, DU J, DAI L R, Et al., An experimental study on speech enhancement based on deep neural networks[J], IEEE Signal Processing Letters, 21, 1, pp. 65-68, (2014)
[9]  
XU Y, DU J, DAI L R, Et al., A regression approach to speech enhancement based on deep neural networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23, 1, pp. 7-19, (2015)
[10]  
HUANG P S, KIM M, HASEGAWA-JOHNSON M, Et al., Joint optimization of masks and deep recurrent neural networks for monaural source separation[J], IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23, 12, pp. 2136-2147, (2015)