MULTI-LEVEL LANGUAGE MODELING AND DECODING FOR OPEN VOCABULARY END-TO-END SPEECH RECOGNITION

被引:0
作者
Hori, Takaaki [1 ]
Watanabe, Shinji [1 ]
Hershey, John R. [1 ]
机构
[1] Mitsubishi Elect Res Labs, 201 Broadway, Cambridge, MA 02139 USA
来源
2017 IEEE AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING WORKSHOP (ASRU) | 2017年
关键词
End-to-end speech recognition; language modeling; decoding; connectionist temporal classification; attention decoder;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a combination of character-based and word-based language models in an end-to-end automatic speech recognition (ASR) architecture. In our prior work, we combined a character-based LSTM RNN-LM with a hybrid attention/connectionist temporal classification (CTC) architecture. The character LMs improved recognition accuracy to rival state-of-the-art DNN/HMM systems in Japanese and Mandarin Chinese tasks. Although a character-based architecture can provide for open vocabulary recognition, the character-based LMs generally under-perform relative to word LMs for languages such as English with a small alphabet, because of the difficulty of modeling linguistic constraints across long sequences of characters. This paper presents a novel method for end-to-end ASR decoding with LMs at both the character and word level. Hypotheses are first scored with the character-based LM until a word boundary is encountered. Known words are then re-scored using the word-based LM, while the character-based LM provides for out-of-vocabulary scores. In a standard Wall Street Journal (WSJ) task, we achieved 5.6 % WER for the Eval'92 test set using only the SI284 training set and WSJ text data, which is the best score reported for end-to-end ASR systems on this benchmark.
引用
收藏
页码:287 / 293
页数:7
相关论文
共 31 条
[1]  
[Anonymous], 2008, THESIS
[2]  
[Anonymous], P 55 ANN M ASS COMP
[3]  
[Anonymous], 2013, SYNTH LECT SPEECH AU
[4]  
[Anonymous], ARXIV170102720
[5]  
[Anonymous], 2012, ARXIV E PRINTS
[6]  
[Anonymous], 2015, IEEE INT C AC SPEECH
[7]  
[Anonymous], 2015, IEEE INT C AC SPEECH
[8]  
[Anonymous], 2017, INTERSPEECH
[9]  
[Anonymous], CSR 1 WSJ0 COMPLETE
[10]  
[Anonymous], IEEE INT C AC SPEECH