Improving Speech Enhancement in Unseen Noise Using Deep Convolutional Neural Network

被引:0
作者
Yuan W.-H. [1 ]
Sun W.-Z. [1 ]
Xia B. [1 ]
Ou S.-F. [2 ]
机构
[1] College of Computer Science and Technology, Shandong University of Technology, Zibo
[2] Institute of Science and Technology for Opto-electronic Information, Yantai University, Yantai
来源
Zidonghua Xuebao/Acta Automatica Sinica | 2018年 / 44卷 / 04期
基金
中国国家自然科学基金;
关键词
Deep convolutional neural network (DCNN); Deep neural network (DNN); Noise; Speech enhancement;
D O I
10.16383/j.aas.2018.c170001
中图分类号
学科分类号
摘要
In order to further improve the performance of speech enhancement method based on deep learning in unseen noise, this paper focuses on the architecture of neural network. Based on the strong correlation between local characteristics of speech and noise signals in time and frequency domains, a deep convolutional neural network (DCNN) model is used to represent the complex nonlinear relationship between noisy speech and clean speech. By designing effective training features and training target, and establishing reasonable network architecture, a speech enhancement method based on DCNN is proposed. Experimental results show that under the condition of unseen noise, the proposed method significantly outperforms the methods based on deep neural network (DNN) in terms of both speech quality and intelligibility. Copyright © 2018 Acta Automatica Sinica. All rights reserved.
引用
收藏
页码:751 / 759
页数:8
相关论文
共 31 条
[1]  
Loizou P.C., Speech Enhancement: Theory and Practice, (2013)
[2]  
Ephraim Y., Malah D., Speech enhancement using a minimum mean-square error log-spectral amplitude estimator, IEEE Transactions on Acoustics, Speech, and Signal Processing, 33, 2, pp. 443-445, (1985)
[3]  
Cohen I., Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging, IEEE Transactions on speech and audio processing, 11, 5, pp. 466-475, (2003)
[4]  
Mohammadiha N., Smaragdis P., Leijon A., Supervised and unsupervised speech enhancement using nonnegative matrix factorization, IEEE Transactions on Audio, Speech, and Language Processing, 21, 10, pp. 2140-2151, (2013)
[5]  
Liu W.-J., Nie S., Liang S., Zhang X.-L., Deep learning based speech separation technology and its developments, Acta Automatica Sinica, 42, 6, pp. 819-833, (2016)
[6]  
Wang Y.X., Wang D.L., Towards scaling up classificationbased speech separation, IEEE Transactions on Audio, Speech, and Language Processing, 21, 7, pp. 1381-1390, (2013)
[7]  
Wang Y.X., Narayanan A., Wang D.L., On training targets for supervised speech separation, IEEE Transactions on Audio, Speech, and Language Processing, 22, 12, pp. 1849-1858, (2014)
[8]  
Xu Y., Du J., Dai L.R., Lee C.H., An experimental study on speech enhancement based on deep neural networks, IEEE Signal Processing Letters, 21, 1, pp. 65-68, (2014)
[9]  
Xu Y., Du J., Dai L.R., Lee C.H., A regression approach to speech enhancement based on deep neural networks, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 23, 1, pp. 7-19, (2015)
[10]  
Williamson D.S., Wang Y.X., Wang D.L., Complex ratio masking for monaural speech separation, IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24, 3, pp. 483-492, (2016)