A Perceptually Motivated Approach for Speech Enhancement Based on Deep Neural Network

被引:2
作者
Han, Wei [1 ]
Zhang, Xiongwei [1 ]
Min, Gang [1 ,2 ]
Sun, Meng [1 ]
机构
[1] PLA Univ Sci & Technol, Lab Intelligence Informat Proc, Nanjing, Jiangsu, Peoples R China
[2] XIAN Commun Inst, Xian, Peoples R China
关键词
perceptually motivated; deep neural network; speech enhancement; masking residual noise; SEPARATION;
D O I
10.1587/transfun.E99.A.835
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
In this letter, a novel perceptually motivated single channel speech enhancement approach based on Deep Neural Network (DNN) is presented. Taking into account the good masking properties of the human auditory system, a new DNN architecture is proposed to reduce the perceptual effect of the residual noise. This new DNN architecture is directly trained to learn a gain function which is used to estimate the power spectrum of clean speech and shape the spectrum of the residual noise at the same time. Experimental results demonstrate that the proposed perceptually motivated speech enhancement approach could achieve better objective speech quality when tested with TIMIT sentences corrupted by various types of noise, no matter whether the noise conditions are included in the training set or not.
引用
收藏
页码:835 / 838
页数:4
相关论文
共 10 条
[1]  
[Anonymous], 2011, P AISTATS
[2]   Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging [J].
Cohen, I .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :466-475
[3]   SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR [J].
EPHRAIM, Y ;
MALAH, D .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02) :443-445
[4]   A perceptually motivated approach for speech enhancement [J].
Hu, Y ;
Loizou, PC .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05) :457-465
[5]   Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation [J].
Huang, Po-Sen ;
Kim, Minje ;
Hasegawa-Johnson, Mark ;
Smaragdis, Paris .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) :2136-2147
[6]   Single-channel speech enhancement using spectral subtraction in the short-time modulation domain [J].
Paliwal, Kuldip ;
Wojcicki, Kamil ;
Schwerin, Belinda .
SPEECH COMMUNICATION, 2010, 52 (05) :450-475
[7]   A novel speech enhancement method based on constrained low-rank and sparse matrix decomposition [J].
Sun, Chengli ;
Zhu, Qi ;
Wan, Minghua .
SPEECH COMMUNICATION, 2014, 60 :44-55
[8]   On Training Targets for Supervised Speech Separation [J].
Wang, Yuxuan ;
Narayanan, Arun ;
Wang, DeLiang .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :1849-1858
[9]   Towards Scaling Up Classification-Based Speech Separation [J].
Wang, Yuxuan ;
Wang, DeLiang .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2013, 21 (07) :1381-1390
[10]   A Regression Approach to Speech Enhancement Based on Deep Neural Networks [J].
Xu, Yong ;
Du, Jun ;
Dai, Li-Rong ;
Lee, Chin-Hui .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (01) :7-19