MULTILINGUAL ACOUSTIC MODELS USING DISTRIBUTED DEEP NEURAL NETWORKS

被引:0
作者
Heigold, G. [1 ]
Vanhoucke, V. [1 ]
Senior, A. [1 ]
Nguyen, P. [1 ]
Ranzato, M. [1 ]
Devin, M. [1 ]
Dean, J. [1 ]
机构
[1] Google Inc, Mountain View, CA 94043 USA
来源
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2013年
关键词
Speech recognition; parameter sharing; deep neural networks; multilingual training; distributed neural networks;
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Today's speech recognition technology is mature enough to be useful for many practical applications. In this context, it is of paramount importance to train accurate acoustic models for many languages within given resource constraints such as data, processing power, and time. Multilingual training has the potential to solve the data issue and close the performance gap between resource-rich and resource-scarce languages. Neural networks lend themselves naturally to parameter sharing across languages, and distributed implementations have made it feasible to train large networks. In this paper, we present experimental results for cross-and multi-lingual network training of eleven Romance languages on 10k hours of data in total. The average relative gains over the monolingual baselines are 4%/2% (data-scarce/data-rich languages) for cross-and 7%/2% for multi-lingual training. However, the additional gain from jointly training the languages on all data comes at an increased training time of roughly four weeks, compared to two weeks (monolingual) and one week (crosslingual).
引用
收藏
页码:8619 / 8623
页数:5
相关论文
共 31 条
  • [1] [Anonymous], ICASSP
  • [2] [Anonymous], ICASSP
  • [3] [Anonymous], 2012, INTERSPEECH
  • [4] [Anonymous], 2013, ICASSP
  • [5] [Anonymous], INTERSPEECH
  • [6] A model of inductive bias learning
    Baxter, J
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2000, 12 : 149 - 198
  • [7] Bottou L., 1991, P NEUR 91 NIM FRANC
  • [8] Current trends in multilingual speech processing
    Bourlard, Herve
    Dines, John
    Magimai-Doss, Mathew
    Garner, Philip N.
    Imseng, David
    Motlicek, Petr
    Liang, Hui
    Saheer, Lakshmi
    Valente, Fabio
    [J]. SADHANA-ACADEMY PROCEEDINGS IN ENGINEERING SCIENCES, 2011, 36 (05): : 885 - 915
  • [9] MULTILINGUAL ACOUSTIC MODELING FOR SPEECH RECOGNITION BASED ON SUBSPACE GAUSSIAN MIXTURE MODELS
    Burget, Lukas
    Schwarz, Petr
    Agarwal, Mohit
    Akyazi, Pinar
    Feng, Kai
    Ghoshal, Arnab
    Glembek, Ondrej
    Goel, Nagendra
    Karafiat, Martin
    Povey, Daniel
    Rastrow, Ariya
    Rose, Richard C.
    Thomas, Samuel
    [J]. 2010 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, 2010, : 4334 - 4337
  • [10] Multitask learning
    Caruana, R
    [J]. MACHINE LEARNING, 1997, 28 (01) : 41 - 75