Towards end-to-end speech recognition with transfer learning

被引:25
作者
Qin, Chu-Xiong [1 ,2 ]
Qu, Dan [1 ]
Zhang, Lian-Hai [1 ]
机构
[1] Natl Digital Switching Syst Engn & Technol R&D Ct, Zhengzhou, Henan, Peoples R China
[2] Xidian Univ, State Key Lab Integrated Serv Networks, Xian, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Speech recognition; End-to-end; Transfer learning;
D O I
10.1186/s13636-018-0141-9
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
A transfer learning-based end-to-end speech recognition approach is presented in two levels in our framework. Firstly, a feature extraction approach combining multilingual deep neural network (DNN) training with matrix factorization algorithm is introduced to extract high-level features. Secondly, the advantage of connectionist temporal classification (CTC) is transferred to the target attention-based model through a joint CTC-attention model composed of shallow recurrent neural networks (RNNs) on top of the proposed features. The experimental results show that the proposed transfer learning approach achieved the best performance among all end-to-end methods and could be comparable to the state-of-the-art speech recognition system for TIMIT when further jointly decoded with a RNN language model.
引用
收藏
页数:9
相关论文
共 33 条
[1]  
[Anonymous], 2015, COMPUTER SCI
[2]  
[Anonymous], 2015, IEEE INT C AC SPEECH
[3]  
[Anonymous], 2012, CoRR
[4]  
[Anonymous], ARXIVV14121602
[5]  
[Anonymous], 2017, ARXIV170602737
[6]  
[Anonymous], 2017, P 55 ANN M ASS COMP
[7]  
[Anonymous], 2013, IEEE INT C AC SPEECH
[8]  
[Anonymous], 2016, P 9 ISCA SPEECH SYNT
[9]  
[Anonymous], 2012, ARXIV E PRINTS
[10]  
[Anonymous], 2017, ABS171101161 CORR