Multi-resolution auditory cepstral coefficient and adaptive mask for speech enhancement with deep neural network

被引:8
作者
Li, Ruwei [1 ]
Sun, Xiaoyue [1 ]
Liu, Yanan [1 ]
Yang, Dengcai [1 ]
Dong, Liang [2 ]
机构
[1] Beijing Univ Technol, Sch Informat & Commun Engn, Fac Informat Technol, Beijing Key Lab Computat Intelligence & Intellige, Beijing, Peoples R China
[2] Baylor Univ, Elect & Comp Engn, Waco, TX 76798 USA
基金
中国国家自然科学基金;
关键词
Speech enhancement; Deep neural network; Multi-resolution auditory cepstral coefficient; Adaptive mask; NOISE;
D O I
10.1186/s13634-019-0618-4
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
The performance of the existing speech enhancement algorithms is not ideal in low signal-to-noise ratio (SNR) non-stationary noise environments. In order to resolve this problem, a novel speech enhancement algorithm based on multi-feature and adaptive mask with deep learning is presented in this paper. First, we construct a new feature called multi-resolution auditory cepstral coefficient (MRACC). This feature which is extracted from four cochleagrams of different resolutions can capture the local information and spectrotemporal context and reduce the algorithm complexity. Second, an adaptive mask (AM) which can track noise change for speech enhancement is put forward. The AM can flexibly combine the advantages of an ideal binary mask (IBM) and an ideal ratio mask (IRM) with the change of SNR. Third, a deep neural network (DNN) architecture is used as a nonlinear function to estimate adaptive mask. And the first and second derivatives of MRACC and MRACC are used as the input of the DNN. Finally, the estimated AM is used to weight the noisy speech to achieve enhanced speech. Experimental results show that the proposed algorithm not only further improves speech quality and intelligibility, but also suppresses more noise than the contrast algorithms. In addition, the proposed algorithm has a lower complexity than the contrast algorithms.
引用
收藏
页数:16
相关论文
共 33 条
[1]  
[Anonymous], [No title captured]
[2]  
[Anonymous], 2006, Computational auditory scene analysis: Principles, algorithms, and applications
[3]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[4]  
CHEN J, 2014, IEEE INTERNATIONAL C
[5]   New insights into the noise reduction Wiener filter [J].
Chen, Jingdong ;
Benesty, Jacob ;
Huang, Yiteng ;
Doclo, Simon .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04) :1218-1234
[6]   Joint Optimization of Masks and Deep Recurrent Neural Networks for Monaural Source Separation [J].
Huang, Po-Sen ;
Kim, Minje ;
Hasegawa-Johnson, Mark ;
Smaragdis, Paris .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (12) :2136-2147
[7]  
Jiang Y, 2016, 9 INT C IM SIGN PROC
[8]  
Kim C, 2010, Signal processing for robust speech recognition motivated by auditory processing
[9]   Speech Enhancement Algorithm Based on Sound Source Localization and Scene Matching for Binaural Digital Hearing Aids [J].
Li, Ruwei ;
Pan, Dongmei ;
Zhang, Shuang .
JOURNAL OF MEDICAL AND BIOLOGICAL ENGINEERING, 2019, 39 (03) :403-417
[10]   ILMSAF based speech enhancement with DNN and noise classification [J].
Li, Ruwei ;
Liu, Yanan ;
Shi, Yongqiang ;
Dong, Liang ;
Cui, Weili .
SPEECH COMMUNICATION, 2016, 85 :53-70