MULTILINGUAL TRAINING OF DEEP NEURAL NETWORKS

被引：0

作者：

Ghoshal, Arnab ^{[1
]}

Swietojanski, Pawel ^{[1
]}

Renals, Steve ^{[1
]}

机构：

[1] Univ Edinburgh, Ctr Speech Technol Res, Edinburgh EH8 9YL, Midlothian, Scotland

来源：

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年

基金：

英国工程与自然科学研究理事会;

关键词：

Speech recognition; deep learning; neural networks; multilingual modeling;

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

We investigate multilingual modeling in the context of a deep neural network (DNN) - hidden Markov model (HMM) hybrid, where the DNN outputs are used as the HMM state likelihoods. By viewing neural networks as a cascade of feature extractors followed by a logistic regression classifier, we hypothesise that the hidden layers, which act as feature extractors, will be transferable between languages. As a corollary, we propose that training the hidden layers on multiple languages makes them more suitable for such cross-lingual transfer. We experimentally confirm these hypotheses on the GlobalPhone corpus using seven languages from three different language families: Germanic, Romance, and Slavic. The experiments demonstrate substantial improvements over a monolingual DNN-HMM hybrid baseline, and hint at avenues of further exploration.

引用

页码：7319 / 7323

页数：5

共 28 条

[1]

[Anonymous], ARXIV12065538

[2]

[Anonymous], P SCIPY

[3]

[Anonymous], 1994, Connectionist Speech Recognition: A Hybrid Approach

[4] Learning Deep Architectures for AI [J].

Bengio, Yoshua .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2009, 2 (01) :1-127

[5] MULTILINGUAL ACOUSTIC MODELING FOR SPEECH RECOGNITION BASED ON SUBSPACE GAUSSIAN MIXTURE MODELS [J].

Burget, Lukas ;

Schwarz, Petr ;

Agarwal, Mohit ;

Akyazi, Pinar ;

Feng, Kai ;

Ghoshal, Arnab ;

Glembek, Ondrej ;

Goel, Nagendra ;

Karafiat, Martin ;

Povey, Daniel ;

Rastrow, Ariya ;

Rose, Richard C. ;

Thomas, Samuel .

2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, :4334-4337

[6]

Byrne W, 2000, INT CONF ACOUST SPEE, P1029

[7] Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition [J].

Dahl, George E. ;

Yu, Dong ;

Deng, Li ;

Acero, Alex .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2012, 20 (01) :30-42

[8]

Grezl F, 2011, P IEEE ASRU

[9]

Hermansky H, 2000, P IEEE ICASSP

[10] A fast learning algorithm for deep belief nets [J].

Hinton, Geoffrey E. ;

Osindero, Simon ;

Teh, Yee-Whye .

NEURAL COMPUTATION, 2006, 18 (07) :1527-1554

← 1 2 3 →