ILMSAF based speech enhancement with DNN and noise classification

被引:16
作者
Li, Ruwei [1 ]
Liu, Yanan [1 ]
Shi, Yongqiang [1 ]
Dong, Liang [2 ]
Cui, Weili [3 ]
机构
[1] Beijing Univ Technol, Sch Elect Informat & Control Engn, Beijing 100124, Peoples R China
[2] Baylor Univ, Elect & Comp Engn, Waco, TX 76798 USA
[3] Wilkes Univ, Wilkes Barre, PA 18704 USA
关键词
Speech enhancement; Deep Belief Network; Noise classification; Improved Least Mean Square Adaptive; Filtering; Deep Neural Network;
D O I
10.1016/j.specom.2016.10.008
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In order to improve the performance of speech enhancement algorithm in low Signal-to-Noise Ratio (SNR) complex noise environments, a novel Improved Least Mean Square Adaptive Filtering (ILMSAF) based speech enhancement algorithm with Deep Neural Network (DNN) and noise classification is proposed. An adaptive coefficient of filter's parameters is introduced into conventional Least Mean Square Adaptive Filtering (LMSAF). First, the adaptive coefficient of filter's parameters is estimated by Deep Belief Network (DBN). Then, the enhanced speech is obtained by ILMSAF. In addition, in order to make the presented approach suitable for various kinds of noise environments, a new noise classification method based on DNN is presented. According to the result of noise classification, the corresponding ILMSAF model is selected in the enhancement process. The performance test results under ITU-TG.160 show that, the performance of the proposed algorithm tends to achieve significant improvements in terms of various speech subjective and objective quality measures than the wiener filtering based speech enhancement approach with Weighted Denoising Auto-encoder and noise classification. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:53 / 70
页数:18
相关论文
共 34 条
[1]  
[Anonymous], 2001, ITU T RECOMMENDATION
[2]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[3]   Voice Conversion Using Deep Neural Networks With Layer-Wise Generative Training [J].
Chen, Ling-Hui ;
Ling, Zhen-Hua ;
Liu, Li-Juan ;
Dai, Li-Rong .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2014, 22 (12) :1859-1872
[4]  
Deepa D., 2012, COMP COMM INF ICCCI, P1
[5]   ON THE APPLICATION OF HIDDEN MARKOV-MODELS FOR ENHANCING NOISY SPEECH [J].
EPHRAIM, Y ;
MALAH, D ;
JUANG, BH .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (12) :1846-1856
[6]   A SIGNAL SUBSPACE APPROACH FOR SPEECH ENHANCEMENT [J].
EPHRAIM, Y ;
VANTREES, HL .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 1995, 3 (04) :251-266
[7]  
Gao Tian, 2015, LAT VAR AN SIGN SEP
[8]  
Gupta P., 2015, COMP COMM CONTR IC4, P1
[9]  
Hansen J. H. L., 1998, INT C SPOKEN LANGUAG, P2819
[10]   A fast learning algorithm for deep belief nets [J].
Hinton, Geoffrey E. ;
Osindero, Simon ;
Teh, Yee-Whye .
NEURAL COMPUTATION, 2006, 18 (07) :1527-1554