Concatenated Identical DNN (CI-DNN) to Reduce Noise-Type Dependence in DNN-Based Speech Enhancement

被引:2
作者
Xu, Ziyi [1 ]
Strake, Maximilian [1 ]
Fingscheidt, Tim [1 ]
机构
[1] Tech Univ Carolo Wilhelmina Braunschweig, Inst Commun Technol, Braunschweig, Germany
来源
2019 27TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO) | 2019年
关键词
Speech enhancement; noise reduction; DNN; noisy speech target; DEEP NEURAL-NETWORK; EXCITATION; RATIO;
D O I
10.23919/eusipco.2019.8903066
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Estimating time-frequency domain masks for speech enhancement using deep learning approaches has recently become a popular field of research. In this paper, we propose a mask-based speech enhancement framework by using concatenated identical deep neural networks (CI-DNNs). The idea is that a single DNN is trained under multiple input and output signal-to-noise power ratio (SNR) conditions, using targets that provide a moderate SNR gain with respect to the input and therefore achieve a balance between speech component quality and noise suppression. We concatenate this single DNN several times without any retraining to provide enough noise attenuation. Simulation results show that our proposed CI-DNN outperforms enhancement methods using classical spectral weighting rules w.r.t. total speech quality and speech intelligibility. Moreover, our approach shows similar or even a little bit better performance with much fewer trainable parameters compared with a noisy-target single DNN approach of the same size. A comparison to the conventional clean-target single DNN approach shows that our proposed CI-DNN is better in speech component quality and much better in residual noise component quality. Most importantly, our new CI-DNN generalized best to an unseen noise type, if compared to the other tested deep learning approaches.
引用
收藏
页数:5
相关论文
共 34 条
  • [11] Erdogan H, 2015, INT CONF ACOUST SPEE, P708, DOI 10.1109/ICASSP.2015.7178061
  • [12] SNR-Based Progressive Learning of Deep Neural Network for Speech Enhancement
    Gao, Tian
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    [J]. 17TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2016), VOLS 1-5: UNDERSTANDING SPEECH PROCESSING IN HUMANS AND MACHINES, 2016, : 3713 - 3717
  • [13] Improved a posteriori speech presence probability estimation based on a likelihood ratio with fixed priors
    Gerkmann, Timo
    Breithaupt, Colin
    Martin, Rainer
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2008, 16 (05): : 910 - 919
  • [14] A multistage representation of the Wiener filter based on orthogonal projections
    Goldstein, JS
    Reed, IS
    Scharf, LL
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 1998, 44 (07) : 2943 - 2959
  • [15] ITU, 2015, P.1130
  • [16] ITU, 2015, WID HANDS FREE COMM
  • [17] Speech enhancement by MAP spectral amplitude estimation using a super-Gaussian speech model
    Lotter, T
    Vary, P
    [J]. EURASIP JOURNAL ON APPLIED SIGNAL PROCESSING, 2005, 2005 (07) : 1110 - 1126
  • [18] Tracking speech-presence uncertainty to improve speech enhancement in non-stationary noise environments
    Malah, D
    Cox, RV
    Accardi, AJ
    [J]. ICASSP '99: 1999 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, PROCEEDINGS VOLS I-VI, 1999, : 789 - 792
  • [19] Noise power spectral density estimation based on optimal smoothing and minimum statistics
    Martin, R
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2001, 9 (05): : 504 - 512
  • [20] Recommendation IT, 2001, Rec. ITU-T P. 862