MULTILINGUAL TRAINING OF DEEP NEURAL NETWORKS

被引:0
作者
Ghoshal, Arnab [1 ]
Swietojanski, Pawel [1 ]
Renals, Steve [1 ]
机构
[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9YL, Midlothian, Scotland
来源
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年
基金
英国工程与自然科学研究理事会;
关键词
Speech recognition; deep learning; neural networks; multilingual modeling;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We investigate multilingual modeling in the context of a deep neural network (DNN) - hidden Markov model (HMM) hybrid, where the DNN outputs are used as the HMM state likelihoods. By viewing neural networks as a cascade of feature extractors followed by a logistic regression classifier, we hypothesise that the hidden layers, which act as feature extractors, will be transferable between languages. As a corollary, we propose that training the hidden layers on multiple languages makes them more suitable for such cross-lingual transfer. We experimentally confirm these hypotheses on the GlobalPhone corpus using seven languages from three different language families: Germanic, Romance, and Slavic. The experiments demonstrate substantial improvements over a monolingual DNN-HMM hybrid baseline, and hint at avenues of further exploration.
引用
收藏
页码:7319 / 7323
页数:5
相关论文
共 28 条
[1]  
[Anonymous], ARXIV12065538
[2]  
[Anonymous], P SCIPY
[3]  
[Anonymous], 1994, Connectionist Speech Recognition: A Hybrid Approach
[4]   Learning Deep Architectures for AI [J].
Bengio, Yoshua .
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127
[5]   MULTILINGUAL ACOUSTIC MODELING FOR SPEECH RECOGNITION BASED ON SUBSPACE GAUSSIAN MIXTURE MODELS [J].
Burget, Lukas ;
Schwarz, Petr ;
Agarwal, Mohit ;
Akyazi, Pinar ;
Feng, Kai ;
Ghoshal, Arnab ;
Glembek, Ondrej ;
Goel, Nagendra ;
Karafiat, Martin ;
Povey, Daniel ;
Rastrow, Ariya ;
Rose, Richard C. ;
Thomas, Samuel .
2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4334-4337
[6]  
Byrne W, 2000, INT CONF ACOUST SPEE, P1029
[7]   Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].
Dahl, George E. ;
Yu, Dong ;
Deng, Li ;
Acero, Alex .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42
[8]  
Grezl F, 2011, P IEEE ASRU
[9]  
Hermansky H, 2000, P IEEE ICASSP
[10]   A fast learning algorithm for deep belief nets [J].
Hinton, Geoffrey E. ;
Osindero, Simon ;
Teh, Yee-Whye .
NEURAL COMPUTATION, 2006, 18 (07) :1527-1554