KL-divergence Regularized Deep Neural Network Adaptation for Low-resource Speaker-dependent Speech Enhancement

被引:1
作者
Chai, Li [1 ]
Du, Jun [2 ]
Lee, Chin-Hui [3 ]
机构
[1] Univ Sci & Technol China, Sch Data Sci, Hefei, Anhui, Peoples R China
[2] Univ Sci & Technol China, Hefei, Anhui, Peoples R China
[3] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
INTERSPEECH 2019 | 2019年
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
speaker-dependent speech enhancement; deep neural network; maximum likelihood; conditional target distribution; Kullback-Leibler divergence regularization; NOISE; SEPARATION;
D O I
10.21437/Interspeech.2019-2426
中图分类号
R36 [病理学]; R76 [耳鼻咽喉科学];
学科分类号
100104 ; 100213 ;
摘要
In this paper, we propose a Kullback-Leibler divergence (KLD) regularized approach to adapting speaker-independent (SI) speech enhancement model based on regression deep neural networks (DNNs) to another speaker-dependent (SD) model using a tiny amount of speaker-specific adaptation data. This algorithm adapts the DNN model conservatively by forcing the conditional target distribution estimated from the SD model to be close to that from the SI model. The constraint is realized by adding KLD regularization to our previously proposed maximum likelihood objective function. Experimental results demonstrate that, even with only 10 seconds of SD adaptation data, the proposed framework consistently achieves speech intelligibility improvements under all 15 unseen noise types evaluated and at all signal-to-noise ratio levels for all 8 test speakers from the WSJ0 evaluation set.
引用
收藏
页码:1806 / 1810
页数:5
相关论文
共 29 条
  • [1] Berouti M., 1979, ICASSP 79. 1979 IEEE International Conference on Acoustics, Speech and Signal Processing, P208
  • [2] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION
    BOLL, SF
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02): : 113 - 120
  • [3] Chai L., 2017, 2017 IEEE 27 INT WOR, P1
  • [4] Noise spectrum estimation in adverse environments: Improved minima controlled recursive averaging
    Cohen, I
    [J]. IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2003, 11 (05): : 466 - 475
  • [5] Perceptual evaluation of blind source separation for robust speech recognition
    Di Persia, Leandro
    Milone, Diego
    Rufiner, Hugo Leonardo
    Yanagida, Masuzo
    [J]. SIGNAL PROCESSING, 2008, 88 (10) : 2578 - 2583
  • [6] SPEECH ENHANCEMENT USING A MINIMUM MEAN-SQUARE ERROR LOG-SPECTRAL AMPLITUDE ESTIMATOR
    EPHRAIM, Y
    MALAH, D
    [J]. IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1985, 33 (02): : 443 - 445
  • [7] Erdogan H, 2015, INT CONF ACOUST SPEE, P708, DOI 10.1109/ICASSP.2015.7178061
  • [8] A unified DNN approach to speaker-dependent simultaneous speech enhancement and speech separation in low SNR environments
    Gao, Tian
    Du, Jun
    Dai, Li-Rong
    Lee, Chin-Hui
    [J]. SPEECH COMMUNICATION, 2017, 95 : 28 - 39
  • [9] Gao T, 2015, 2015 IEEE CHINA SUMMIT & INTERNATIONAL CONFERENCE ON SIGNAL AND INFORMATION PROCESSING, P687, DOI 10.1109/ChinaSIP.2015.7230492
  • [10] Garofalo J., 2007, Linguistic Data Consortium