An Adaptation Method in Noise Mismatch Conditions for DNN-based Speech Enhancement

被引:1
作者
Xu Si-Ying [1 ]
Niu Tong [1 ]
Qu Dan [1 ]
Long Xing-Yan [1 ]
机构
[1] Natl Digital Switching Syst Engn & Technol R&D Ct, Zhengzhou, Henan, Peoples R China
来源
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS | 2018年 / 12卷 / 10期
关键词
Noise-aware Training; identity-vector; L-2; regularization; speech enhancement; DNN; condition mismatch; INTELLIGIBILITY; RECOGNITION; SUPPRESSION; SELECTION; MODEL;
D O I
10.3837/tiis.2018.10.017
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The deep learning based speech enhancement has shown considerable success. However, it still suffers performance degradation under mismatch conditions. In this paper, an adaptation method is proposed to improve the performance under noise mismatch conditions. Firstly, we advise a noise aware training by supplying identity vectors (i-vectors) as parallel input features to adapt deep neural network (DNN) acoustic models with the target noise. Secondly, given a small amount of adaptation data, the noise-dependent DNN is obtained by using L-2 regularization from a noise-independent DNN, and forcing the estimated masks to be close to the unadapted condition. Finally, experiments were carried out on different noise and SNR conditions, and the proposed method has achieved significantly 0.1%-9.6% benefits of STOI, and provided consistent improvement in PESQ and segSNR against the baseline systems.
引用
收藏
页码:4930 / 4951
页数:22
相关论文
共 51 条
  • [1] Abdel-Hamid O, 2013, INT CONF ACOUST SPEE, P7942, DOI 10.1109/ICASSP.2013.6639211
  • [2] Albesano D, 2006, IEEE IJCNN, P1554
  • [3] [Anonymous], P INT C AC SPEECH SI
  • [4] [Anonymous], 2013, COMPUT REV
  • [5] [Anonymous], P INT C LEARN REPR I
  • [6] Learning Deep Architectures for AI
    Bengio, Yoshua
    [J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01): : 1 - 127
  • [7] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION
    BOLL, SF
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02): : 113 - 120
  • [8] Speech enhancement for non-stationary noise environments
    Cohen, I
    Berdugo, B
    [J]. SIGNAL PROCESSING, 2001, 81 (11) : 2403 - 2418
  • [9] Elastic-net regularization in learning theory
    De Mol, Christine
    De Vito, Ernesto
    Rosasco, Lorenzo
    [J]. JOURNAL OF COMPLEXITY, 2009, 25 (02) : 201 - 230
  • [10] Front-End Factor Analysis for Speaker Verification
    Dehak, Najim
    Kenny, Patrick J.
    Dehak, Reda
    Dumouchel, Pierre
    Ouellet, Pierre
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798