DNN-Based Voice Activity Detection with Multi-Task Learning

被引:31
作者
Kang, Tae Gyoon [1 ,2 ]
Kim, Nam Soo [1 ,2 ]
机构
[1] Seoul Natl Univ, Dept Elect & Comp Engn, Seoul 151742, South Korea
[2] Seoul Natl Univ, Inst New Media & Commun, Seoul 151742, South Korea
基金
新加坡国家研究基金会;
关键词
deep neural network; voice activity detection; multi-task learning; NETWORKS;
D O I
10.1587/transinf.2015EDL8168
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recently, notable improvements in voice activity detection (VAD) problem have been achieved by adopting several machine learning techniques. Among them, the deep neural network (DNN) which learns the mapping between the noisy speech features and the corresponding voice activity status with its deep hidden structure has been one of the most popular techniques. In this letter, we propose a novel approach which enhances the robustness of DNN in mismatched noise conditions with multi-task learning (MTL) framework. In the proposed algorithm, a feature enhancement task for speech features is jointly trained with the conventional VAD task. The experimental results show that the DNN with the proposed framework outperforms the conventional DNN-based VAD algorithm.
引用
收藏
页码:550 / 553
页数:4
相关论文
共 13 条
[1]  
[Anonymous], 2010, P PYTH SCI C
[2]  
[Anonymous], 1993, P 1993 CONN MOD SUMM
[3]  
Bell P, 2015, INT CONF ACOUST SPEE, P4290, DOI 10.1109/ICASSP.2015.7178780
[4]  
Bengio Yoshua, 2012, Neural Networks: Tricks of the Trade. Second Edition: LNCS 7700, P437, DOI 10.1007/978-3-642-35289-8_26
[5]   Multitask Learning of Deep Neural Networks for Low-Resource Speech Recognition [J].
Chen, Dongpeng ;
Mak, Brian Kan-Wing .
IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2015, 23 (07) :1172-1183
[6]  
Dong EQ, 2002, 2002 6TH INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING PROCEEDINGS, VOLS I AND II, P1124, DOI 10.1109/ICOSP.2002.1179987
[7]  
Giri R, 2015, INT CONF ACOUST SPEE, P5014, DOI 10.1109/ICASSP.2015.7178925
[8]   A fast learning algorithm for deep belief nets [J].
Hinton, Geoffrey E. ;
Osindero, Simon ;
Teh, Yee-Whye .
NEURAL COMPUTATION, 2006, 18 (07) :1527-1554
[9]  
Pearce D., 2000, P 6 INT C SPOK LANG, V4, P29
[10]   Voice activity detection based on statistical models and machine learning approaches [J].
Shin, Jong Won ;
Chang, Joon-Hyuk ;
Kim, Nam Soo .
COMPUTER SPEECH AND LANGUAGE, 2010, 24 (03) :515-530