Rapid Adaptation for Deep Neural Networks through Multi-Task Learning

被引：0

作者：

Huang, Zhen ^{[1
]}

Li, Jinyu ^{[2
]}

Siniscalchi, Sabato Marco ^{[1
,3
]}

Chen, I-Fan ^{[1
]}

Wu, Ji ^{[1
,4
]}

Lee, Chin-Hui ^{[1
]}

机构：

[1] Georgia Inst Technol, Sch ECE, Atlanta, GA 30332 USA

[2] Microsoft Corp, One Microsoft Way, Redmond, WA 98052 USA

[3] Kore Univ Enna, Dept Telemat, Enna, Italy

[4] Tsinghua Univ, Dept Elect Engn, Beijing, Peoples R China

来源：

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 | 2015年

关键词：

deep neural networks; speaker adaptation; multi-task learning; CD-DNN-HMM; HIDDEN MARKOV-MODELS; SPEECH;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We propose a novel approach to addressing the adaptation effectiveness issue in parameter adaptation for deep neural network (DNN) based acoustic models for automatic speech recognition by adding one or more small auxiliary output layers modeling broad acoustic units, such as mono-phones or tied-state (often called senone) clusters. In scenarios with a limited amount of available adaptation data, most senones are usually rarely seen or not observed, and consequently the ability to model them in a new condition is often not fully exploited. With the original senone classification task as the primary task, and adding auxiliary mono-phone/senone-cluster classification as the secondary tasks, multi-task learning (MTL) is employed to adapt the DNN parameters. With the proposed MTL adaptation framework, we improve the learning ability of the original DNN structure, then enlarge the coverage of the acoustic space to deal with the unseen senone problem, and thus enhance the discrimination power of the adapted DNN models. Experimental results on the 20,000-word open vocabulary WSJ task demonstrate that the proposed framework consistently outperforms the conventional linear hidden layer adaptation schemes without MTh by providing 5.4% relative reduction in word error rate (WERR) with only 1 single adaptation utterance, and 10.7% WERR with 40 adaptation utterances against the un-adapted DNN models.

引用

页码：3625 / 3629

页数：5

共 39 条

[1]

Abdel-Hamid O., 2013, INTERSPEECH, P1248

[2]

Abdel-Hamid O, 2013, INT CONF ACOUST SPEE, P7942, DOI 10.1109/ICASSP.2013.6639211

[3]

[Anonymous], 1988, LEARNING REPRESENTAT

[4]

[Anonymous], 2004, P AUSTR INT C SPEECH

[5]

Caruana R., P 10 INT C MACH LEAR, P41

[6]

Dekel Ofer., 2011, P 28 INT C MACHINE L, P713

[7] Linear hidden transformations for adaptation of hybrid ANN/HMM models [J].

Gemello, Roberto ;

Mana, Franco ;

Scanzio, Stefano ;

Laface, Pietro ;

De Mori, Renato .

SPEECH COMMUNICATION, 2007, 49 (10-11) :827-835

[8] Equivalence of Generative and Log-Linear Models [J].

Heigold, Georg ;

Ney, Hermann ;

Lehnen, Patrick ;

Gass, Tobias ;

Schlueter, Ralf .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2011, 19 (05) :1138-1148

[9] Reducing the dimensionality of data with neural networks [J].

Hinton, G. E. ;

Salakhutdinov, R. R. .

SCIENCE, 2006, 313 (5786) :504-507

[10] Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].

Hinton, Geoffrey ;

Deng, Li ;

Yu, Dong ;

Dahl, George E. ;

Mohamed, Abdel-rahman ;

Jaitly, Navdeep ;

Senior, Andrew ;

Vanhoucke, Vincent ;

Patrick Nguyen ;

Sainath, Tara N. ;

Kingsbury, Brian .

IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97

← 1 2 3 4 →