KL-divergence Regularized Deep Neural Network Adaptation for Low-resource Speaker-dependent Speech Enhancement

被引：1

作者：

Chai, Li ^{[1
]}

Du, Jun ^{[2
]}

Lee, Chin-Hui ^{[3
]}

机构：

[1] Univ Sci & Technol China, Sch Data Sci, Hefei, Anhui, Peoples R China

[2] Univ Sci & Technol China, Hefei, Anhui, Peoples R China

[3] Georgia Inst Technol, Atlanta, GA 30332 USA

来源：

INTERSPEECH 2019 | 2019年

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

speaker-dependent speech enhancement; deep neural network; maximum likelihood; conditional target distribution; Kullback-Leibler divergence regularization; NOISE; SEPARATION;

D O I：

10.21437/Interspeech.2019-2426

中图分类号：

R36 [病理学]; R76 [耳鼻咽喉科学];

学科分类号：

100104 ; 100213 ;

摘要：

In this paper, we propose a Kullback-Leibler divergence (KLD) regularized approach to adapting speaker-independent (SI) speech enhancement model based on regression deep neural networks (DNNs) to another speaker-dependent (SD) model using a tiny amount of speaker-specific adaptation data. This algorithm adapts the DNN model conservatively by forcing the conditional target distribution estimated from the SD model to be close to that from the SI model. The constraint is realized by adding KLD regularization to our previously proposed maximum likelihood objective function. Experimental results demonstrate that, even with only 10 seconds of SD adaptation data, the proposed framework consistently achieves speech intelligibility improvements under all 15 unseen noise types evaluated and at all signal-to-noise ratio levels for all 8 test speakers from the WSJ0 evaluation set.

引用

页码：1806 / 1810

页数：5

共 29 条

[1] Berouti M., 1979, ICASSP 79. 1979 IEEE International Conference on Acoustics, Speech and Signal Processing, P208
[2] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION
BOLL, SF
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02): : 113 - 120
[3] Chai L., 2017, 2017 IEEE 27 INT WOR, P1
[4] Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging
Cohen, I
[J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05): : 466 - 475
[5] Perceptual evaluation of blind source separation for robust speech recognition
Di Persia, Leandro
Milone, Diego
Rufiner, Hugo Leonardo
Yanagida, Masuzo
[J]. SIGNAL PROCESSING, 2008, 88 (10) : 2578 - 2583
[6] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR
EPHRAIM, Y
MALAH, D
[J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02): : 443 - 445
[7] Erdogan H, 2015, INT CONF ACOUST SPEE, P708, DOI 10.1109/ICASSP.2015.7178061
[8] A unified DNN approach to speaker-dependent simultaneous speech enhancement and speech separation in low SNR environments
Gao, Tian
Du, Jun
Dai, Li-Rong
Lee, Chin-Hui
[J]. SPEECH COMMUNICATION, 2017, 95 : 28 - 39
[9] Gao T, 2015, 2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, P687, DOI 10.1109/ChinaSIP.2015.7230492
[10] Garofalo J., 2007, Linguistic Data Consortium

← 1 2 3 →