Multi-task Learning Deep Neural Networks For Speech Feature Denoising

被引:0
作者
Huang, Bin [1 ]
Ke, Dengfeng [2 ]
Zheng, Hao [2 ]
Xu, Bo [2 ]
Xu, Yanyan [1 ]
Su, Kaile [3 ]
机构
[1] Beijing Forestry Univ, Sch Informat Sci & Technol, Beijing, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[3] Griffith Univ, Inst Integrated & Intelligent Syst, Brisbane, Qld, Australia
来源
16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年
关键词
multi-task learning; feature denoising; deep neural networks; ENHANCEMENT;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Traditional automatic speech recognition (ASR) systems usually get a sharp performance drop when noise presents in speech. To make a robust ASR, we introduce a new model using the multi-task learning deep neural networks (MTL-DNN) to solve the speech denoising task in feature level. In this model, the networks are initialized by pre-training restricted Boltzmann machines (RBM) and fine-tuned by jointly learning multiple interactive tasks using a shared representation. In multi-task learning, we choose a noisy-clean speech pair fitting task as the primary task and separately explore two constraints as the secondary tasks: phone label and phone cluster. In experiments, the denoised speech is reconstructed by the MTL-DNN using the noisy speech as input and it is respectively evaluated by the DNN-hidden Markov model (HMM) based and the Gaussian Mixture Model (GMM)-HMM based ASR systems. Results show that, using the denoised speech, the word error rate (WER) is respectively reduced by 53.14% and 34.84% compared with baselines. The MTL-DNN model also outperforms the general single-task learning deep neural networks (STL-DNN) model with a performance improvement of 4.93% and 3.88% respectively.
引用
收藏
页码:2464 / 2468
页数:5
相关论文
共 15 条
[1]  
[Anonymous], 2008, P 25 INT C MACH LEAR, DOI DOI 10.1145/1390156.1390177
[2]  
[Anonymous], 2006, TECH REP
[3]  
[Anonymous], 2011, IEEE SIGNAL PROCESSI
[4]   Speech enhancement based on the subspace method [J].
Asano, F ;
Hayamizu, S ;
Yamada, T ;
Nakamura, S .
IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (05) :497-507
[5]   Wavelet speech enhancement based on the Teager Energy operator [J].
Bahoura, M ;
Rouat, J .
IEEE SIGNAL PROCESSING LETTERS, 2001, 8 (01) :10-12
[6]   SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].
BOLL, SF .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120
[7]  
Chapelle P., 2010, P 16 ACM SIGKDD INT, P1189, DOI [DOI 10.1145/1835804.1835953, 10.1145/1835804.1835953]
[8]  
Dongpeng Chen, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P5592, DOI 10.1109/ICASSP.2014.6854673
[9]  
Hinton G.E., 2012, Neural networks: Tricks of the trade
[10]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97