An Adaptation Method in Noise Mismatch Conditions for DNN-based Speech Enhancement

被引：1

作者：

Xu Si-Ying ^{[1
]}

Niu Tong ^{[1
]}

Qu Dan ^{[1
]}

Long Xing-Yan ^{[1
]}

机构：

[1] Natl Digital Switching Syst Engn & Technol R&D Ct, Zhengzhou, Henan, Peoples R China

来源：

KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS | 2018年 / 12卷 / 10期

关键词：

Noise-aware Training; identity-vector; L-2; regularization; speech enhancement; DNN; condition mismatch; INTELLIGIBILITY; RECOGNITION; SUPPRESSION; SELECTION; MODEL;

D O I：

10.3837/tiis.2018.10.017

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The deep learning based speech enhancement has shown considerable success. However, it still suffers performance degradation under mismatch conditions. In this paper, an adaptation method is proposed to improve the performance under noise mismatch conditions. Firstly, we advise a noise aware training by supplying identity vectors (i-vectors) as parallel input features to adapt deep neural network (DNN) acoustic models with the target noise. Secondly, given a small amount of adaptation data, the noise-dependent DNN is obtained by using L-2 regularization from a noise-independent DNN, and forcing the estimated masks to be close to the unadapted condition. Finally, experiments were carried out on different noise and SNR conditions, and the proposed method has achieved significantly 0.1%-9.6% benefits of STOI, and provided consistent improvement in PESQ and segSNR against the baseline systems.

引用

页码：4930 / 4951

页数：22

共 51 条

[1] Abdel-Hamid O, 2013, INT CONF ACOUST SPEE, P7942, DOI 10.1109/ICASSP.2013.6639211
[2] Albesano D, 2006, IEEE IJCNN, P1554
[3] [Anonymous], P INT C AC SPEECH SI
[4] [Anonymous], 2013, COMPUT REV
[5] [Anonymous], P INT C LEARN REPR I
[6] Learning Deep Architectures for AI
Bengio, Yoshua
[J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01): : 1 - 127
[7] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION
BOLL, SF
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02): : 113 - 120
[8] Speech enhancement for non-stationary noise environments
Cohen, I
Berdugo, B
[J]. SIGNAL PROCESSING, 2001, 81 (11) : 2403 - 2418
[9] Elastic-net regularization in learning theory
De Mol, Christine
De Vito, Ernesto
Rosasco, Lorenzo
[J]. JOURNAL OF COMPLEXITY, 2009, 25 (02) : 201 - 230
[10] Front-End Factor Analysis for Speaker Verification
Dehak, Najim
Kenny, Patrick J.
Dehak, Reda
Dumouchel, Pierre
Ouellet, Pierre
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (04): : 788 - 798

← 1 2 3 4 5 6 →