Concatenated Identical DNN (CI-DNN) to Reduce Noise-Type Dependence in DNN-Based Speech Enhancement

被引：2

作者：

Xu, Ziyi ^{[1
]}

Strake, Maximilian ^{[1
]}

Fingscheidt, Tim ^{[1
]}

机构：

[1] Tech Univ Carolo Wilhelmina Braunschweig, Inst Commun Technol, Braunschweig, Germany

来源：

2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2019年

关键词：

Speech enhancement; noise reduction; DNN; noisy speech target; DEEP NEURAL-NETWORK; EXCITATION; RATIO;

D O I：

10.23919/eusipco.2019.8903066

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Estimating time-frequency domain masks for speech enhancement using deep learning approaches has recently become a popular field of research. In this paper, we propose a mask-based speech enhancement framework by using concatenated identical deep neural networks (CI-DNNs). The idea is that a single DNN is trained under multiple input and output signal-to-noise power ratio (SNR) conditions, using targets that provide a moderate SNR gain with respect to the input and therefore achieve a balance between speech component quality and noise suppression. We concatenate this single DNN several times without any retraining to provide enough noise attenuation. Simulation results show that our proposed CI-DNN outperforms enhancement methods using classical spectral weighting rules w.r.t. total speech quality and speech intelligibility. Moreover, our approach shows similar or even a little bit better performance with much fewer trainable parameters compared with a noisy-target single DNN approach of the same size. A comparison to the conventional clean-target single DNN approach shows that our proposed CI-DNN is better in speech component quality and much better in residual noise component quality. Most importantly, our new CI-DNN generalized best to an unseen noise type, if compared to the other tested deep learning approaches.

引用

页数：5

共 34 条

[11] Erdogan H, 2015, INT CONF ACOUST SPEE, P708, DOI 10.1109/ICASSP.2015.7178061
[12] SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement
Gao, Tian
Du, Jun
Dai, Li-Rong
Lee, Chin-Hui
[J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3713 - 3717
[13] Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors
Gerkmann, Timo
Breithaupt, Colin
Martin, Rainer
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (05): : 910 - 919
[14] A multistage representation of the Wiener filter based on orthogonal projections
Goldstein, JS
Reed, IS
Scharf, LL
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (07) : 2943 - 2959
[15] ITU, 2015, P.1130
[16] ITU, 2015, WID HANDS FREE COMM
[17] Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model
Lotter, T
Vary, P
[J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2005, 2005 (07) : 1110 - 1126
[18] Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments
Malah, D
Cox, RV
Accardi, AJ
[J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 789 - 792
[19] Noise power spectral density estimation based on optimal smoothing and minimum statistics
Martin, R
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (05): : 504 - 512
[20] Recommendation IT, 2001, Rec. ITU-T P. 862

← 1 2 3 4 →