Multi-task Learning Deep Neural Networks For Speech Feature Denoising

被引：0

作者：

Huang, Bin ^{[1
]}

Ke, Dengfeng ^{[2
]}

Zheng, Hao ^{[2
]}

Xu, Bo ^{[2
]}

Xu, Yanyan ^{[1
]}

Su, Kaile ^{[3
]}

机构：

[1] Beijing Forestry Univ, Sch Informat Sci & Technol, Beijing, Peoples R China

[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China

[3] Griffith Univ, Inst Integrated & Intelligent Syst, Brisbane, Qld, Australia

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

multi-task learning; feature denoising; deep neural networks; ENHANCEMENT;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

Traditional automatic speech recognition (ASR) systems usually get a sharp performance drop when noise presents in speech. To make a robust ASR, we introduce a new model using the multi-task learning deep neural networks (MTL-DNN) to solve the speech denoising task in feature level. In this model, the networks are initialized by pre-training restricted Boltzmann machines (RBM) and fine-tuned by jointly learning multiple interactive tasks using a shared representation. In multi-task learning, we choose a noisy-clean speech pair fitting task as the primary task and separately explore two constraints as the secondary tasks: phone label and phone cluster. In experiments, the denoised speech is reconstructed by the MTL-DNN using the noisy speech as input and it is respectively evaluated by the DNN-hidden Markov model (HMM) based and the Gaussian Mixture Model (GMM)-HMM based ASR systems. Results show that, using the denoised speech, the word error rate (WER) is respectively reduced by 53.14% and 34.84% compared with baselines. The MTL-DNN model also outperforms the general single-task learning deep neural networks (STL-DNN) model with a performance improvement of 4.93% and 3.88% respectively.

引用

页码：2464 / 2468

页数：5

共 15 条

[1]

[Anonymous], 2008, P 25 INT C MACH LEAR, DOI DOI 10.1145/1390156.1390177

[2]

[Anonymous], 2006, TECH REP

[3]

[Anonymous], 2011, IEEE SIGNAL PROCESSI

[4] Speech enhancement based on the subspace method [J].

Asano, F ;

Hayamizu, S ;

Yamada, T ;

Nakamura, S .

IEEE TRANSACTIONS ON SPEECH AND AUDIO PROCESSING, 2000, 8 (05) :497-507

[5] Wavelet speech enhancement based on the Teager Energy operator [J].

Bahoura, M ;

Rouat, J .

IEEE SIGNAL PROCESSING LETTERS, 2001, 8 (01) :10-12

[6] SUPPRESSION OF ACOUSTIC NOISE IN SPEECH USING SPECTRAL SUBTRACTION [J].

BOLL, SF .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1979, 27 (02) :113-120

[7]

Chapelle P., 2010, P 16 ACM SIGKDD INT, P1189, DOI [DOI 10.1145/1835804.1835953, 10.1145/1835804.1835953]

[8]

Dongpeng Chen, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P5592, DOI 10.1109/ICASSP.2014.6854673

[9]

Hinton G.E., 2012, Neural networks: Tricks of the trade

[10] Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].

Hinton, Geoffrey ;

Deng, Li ;

Yu, Dong ;

Dahl, George E. ;

Mohamed, Abdel-rahman ;

Jaitly, Navdeep ;

Senior, Andrew ;

Vanhoucke, Vincent ;

Patrick Nguyen ;

Sainath, Tara N. ;

Kingsbury, Brian .

IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97

← 1 2 →