MULTI-TASK LEARNING IN DEEP NEURAL NETWORKS FOR IMPROVED PHONEME RECOGNITION

被引:0
作者
Seltzer, Michael L. [1 ]
Droppo, Jasha [1 ]
机构
[1] Microsoft Res, Redmond, WA 98052 USA
来源
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年
关键词
Acoustic model; speech recognition; multi-task learning; deep neural network; TIMIT;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
In this paper we demonstrate how to improve the performance of deep neural network (DNN) acoustic models using multi-task learning. In multi-task learning, the network is trained to perform both the primary classification task and one or more secondary tasks using a shared representation. The additional model parameters associated with the secondary tasks represent a very small increase in the number of trained parameters, and can be discarded at runtime. In this paper, we explore three natural choices for the secondary task: the phone label, the phone context, and the state context. We demonstrate that, even on a strong baseline, multi-task learning can provide a significant decrease in error rate. Using phone context, the phonetic error rate (PER) on TIMIT is reduced from 21.63% to 20.25% on the core test set, and surpassing the best performance in the literature for a DNN that uses a standard feed-forward network architecture.
引用
收藏
页码:6965 / 6969
页数:5
相关论文
共 14 条
[1]  
[Anonymous], AUDIO SPEECH LANGUAG
[2]  
[Anonymous], 2010, MOMENTUM
[3]  
[Anonymous], P INTERSPEECH
[4]  
[Anonymous], 2004, P AUSTR INT C SPEECH
[5]  
[Anonymous], P INTERSPEECH
[6]  
[Anonymous], INT C MACH LEARN ICM
[7]  
[Anonymous], P INTERSPEECH
[8]   Multitask learning [J].
Caruana, R .
MACHINE LEARNING, 1997, 28 (01) :41-75
[9]  
GAROFOLO J.S., 1986, The DARPA TIMIT Acoustic-Phonetic Continuous Speech Corpus
[10]   SPEAKER-INDEPENDENT PHONE RECOGNITION USING HIDDEN MARKOV-MODELS [J].
LEE, KF ;
HON, HW .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1989, 37 (11) :1641-1648