Concatenated Identical DNN (CI-DNN) to Reduce Noise-Type Dependence in DNN-Based Speech Enhancement

被引：2

作者：

Xu, Ziyi ^{[1
]}

Strake, Maximilian ^{[1
]}

Fingscheidt, Tim ^{[1
]}

机构：

[1] Tech Univ Carolo Wilhelmina Braunschweig, Inst Commun Technol, Braunschweig, Germany

来源：

2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2019年

关键词：

Speech enhancement; noise reduction; DNN; noisy speech target; DEEP NEURAL-NETWORK; EXCITATION; RATIO;

D O I：

10.23919/eusipco.2019.8903066

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Estimating time-frequency domain masks for speech enhancement using deep learning approaches has recently become a popular field of research. In this paper, we propose a mask-based speech enhancement framework by using concatenated identical deep neural networks (CI-DNNs). The idea is that a single DNN is trained under multiple input and output signal-to-noise power ratio (SNR) conditions, using targets that provide a moderate SNR gain with respect to the input and therefore achieve a balance between speech component quality and noise suppression. We concatenate this single DNN several times without any retraining to provide enough noise attenuation. Simulation results show that our proposed CI-DNN outperforms enhancement methods using classical spectral weighting rules w.r.t. total speech quality and speech intelligibility. Moreover, our approach shows similar or even a little bit better performance with much fewer trainable parameters compared with a noisy-target single DNN approach of the same size. A comparison to the conventional clean-target single DNN approach shows that our proposed CI-DNN is better in speech component quality and much better in residual noise component quality. Most importantly, our new CI-DNN generalized best to an unseen noise type, if compared to the other tested deep learning approaches.

引用

页数：5

共 34 条

[1] [Anonymous], 2011, OBJ MEAS ACT SPEECH
[2] Barker J, 2015, 2015 IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING (ASRU), P504, DOI 10.1109/ASRU.2015.7404837
[3] Speech enhancement using super-Gaussian speech models and noncausal a priori SNR estimation
Cohen, I
[J]. SPEECH COMMUNICATION, 2005, 47 (03) : 336 - 350
[4] An audio-visual corpus for speech perception and automatic speech recognition (L)
Cooke, Martin
Barker, Jon
Cunningham, Stuart
Shao, Xu
[J]. JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2006, 120 (05) : 2421 - 2424
[5] A Regression Approach to Single-Channel Speech Separation Via High-Resolution Deep Neural Networks
Du, Jun
Tu, Yanhui
Dai, Li-Rong
Lee, Chin-Hui
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2016, 24 (08) : 1424 - 1437
[6] DNN-Supported Speech Enhancement With Cepstral Estimation of Both Excitation and Envelope
Elshamy, Samy
Madhu, Nilesh
Tirry, Wouter
Fingscheidt, Tim
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2018, 26 (12) : 2460 - 2474
[7] Instantaneous A Priori SNR Estimation by Cepstral Excitation Manipulation
Elshamy, Samy
Madhu, Nilesh
Tirry, Wouter
Fingscheidt, Tim
[J]. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2017, 25 (08) : 1592 - 1605
[8] Elshamy S, 2015, 16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5, P1740
[9] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR
EPHRAIM, Y
MALAH, D
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02): : 443 - 445
[10] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR SHORT-TIME SPECTRAL AMPLITUDE ESTIMATOR
EPHRAIM, Y
MALAH, D
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1984, 32 (06): : 1109 - 1121

← 1 2 3 4 →