MULTI-TASK LEARNING IN DEEP NEURAL NETWORKS FOR IMPROVED PHONEME RECOGNITION

被引：0

作者：

Seltzer, Michael L. ^{[1
]}

Droppo, Jasha ^{[1
]}

机构：

[1] Microsoft Res, Redmond, WA 98052 USA

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年

关键词：

Acoustic model; speech recognition; multi-task learning; deep neural network; TIMIT;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

In this paper we demonstrate how to improve the performance of deep neural network (DNN) acoustic models using multi-task learning. In multi-task learning, the network is trained to perform both the primary classification task and one or more secondary tasks using a shared representation. The additional model parameters associated with the secondary tasks represent a very small increase in the number of trained parameters, and can be discarded at runtime. In this paper, we explore three natural choices for the secondary task: the phone label, the phone context, and the state context. We demonstrate that, even on a strong baseline, multi-task learning can provide a significant decrease in error rate. Using phone context, the phonetic error rate (PER) on TIMIT is reduced from 21.63% to 20.25% on the core test set, and surpassing the best performance in the literature for a DNN that uses a standard feed-forward network architecture.

引用

页码：6965 / 6969

页数：5

共 14 条

[1]

[Anonymous], AUDIO SPEECH LANGUAG

[2]

[Anonymous], 2010, MOMENTUM

[3]

[Anonymous], P INTERSPEECH

[4]

[Anonymous], 2004, P AUSTR INT C SPEECH

[5]

[Anonymous], P INTERSPEECH

[6]

[Anonymous], INT C MACH LEARN ICM

[7]

[Anonymous], P INTERSPEECH

[8] Multitask learning [J].

Caruana, R .

MACHINE LEARNING, 1997, 28 (01) :41-75

[9]

GAROFOLO J.S., 1986, The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus

[10] SPEAKER-INDEPENDENT PHONE RECOGNITION USING HIDDEN MARKOV-MODELS [J].

LEE, KF ;

HON, HW .

IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (11) :1641-1648

← 1 2 →