Language Adaptive Multilingual CTC Speech Recognition

被引:7
作者
Mueller, Markus [1 ,2 ]
Stueker, Sebastian [1 ,2 ]
Waibel, Alex [1 ,2 ,3 ]
机构
[1] Inst Anthropomat & Robot, Karlsruhe, Germany
[2] Karlsruhe Inst Technol, Karlsruhe, Germany
[3] Carnegie Mellon Univ, Language Technol Inst, Pittsburgh, PA 15213 USA
来源
SPEECH AND COMPUTER, SPECOM 2017 | 2017年 / 10458卷
关键词
Speech recognition; Low-resource; Multilingual training; Connectionist temporal classification;
D O I
10.1007/978-3-319-66429-3_47
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
Recently, it has been demonstrated that speech recognition systems are able to achieve human parity. While much research is done for resource-rich languages like English, there exists a long tail of languages for which no speech recognition systems do yet exist. The major obstacle in building systems for new languages is the lack of available resources. In the past, several methods have been proposed to build systems in low-resource conditions by using data from additional source languages during training. While it has been shown that DNN/HMM hybrid setups trained in low-resource conditions benefit from additional data, we are proposing a similar technique using sequence based neural network acoustic models with Connectionist Temporal Classification (CTC) loss function. We demonstrate that setups with multilingual phone sets benefit from the addition of Language Feature Vectors (LFVs).
引用
收藏
页码:473 / 482
页数:10
相关论文
共 42 条
[1]  
Amodei D, 2015, Arxiv, DOI [arXiv:1512.02595, DOI 10.48550/ARXIV.1512.02595]
[2]  
[Anonymous], BIGLEARN NIPS WORKSH
[3]   Multitask learning [J].
Caruana, R .
MACHINE LEARNING, 1997, 28 (01) :41-75
[4]  
Dongpeng Chen, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P5592, DOI 10.1109/ICASSP.2014.6854673
[5]  
Ghoshal A, 2013, INT CONF ACOUST SPEE, P7319, DOI 10.1109/ICASSP.2013.6639084
[6]  
github, WARP CTC
[7]  
Graves A., 2006, INT C MACH LEARN
[8]  
Gretter R., 2014, 15 ANN C INT SPEECH
[9]  
Grezl Frantisek, 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), P7654, DOI 10.1109/ICASSP.2014.6855089
[10]  
Heigold G, 2013, INT CONF ACOUST SPEE, P8619, DOI 10.1109/ICASSP.2013.6639348