Efficient WFST-based one-pass decoding with on-the-fly hypothesis rescoring in extremely large vocabulary continuous speech recognition

被引:122
作者
Hori, Takaaki [1 ]
Hori, Chiori
Minami, Yasuhiro
Nakamura, Atsushi
机构
[1] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan
[2] MIT, Comp Sci & Artificial Intelligence Lab, Cambridge, MA 02139 USA
[3] NTT Corp, NTT Commun Sci Labs, Kyoto 6190237, Japan
来源
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING | 2007年 / 15卷 / 04期
关键词
on-the-fly composition; speech recognition; weighted finite-state transducer (WFST);
D O I
10.1109/TASL.2006.889790
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
This paper proposes a novel one-pass search algorithm with on-the-fly composition of weighted finite-state transducers (WFSTs) for large-vocabulary continuous-speech recognition. In the standard search method with on-the-fly composition, two or more WFSTs are composed during decoding, and a Viterbi search is performed based on the composed search space. With this new method, a Viterbi search is performed based on the first of the two WFSTs. The second WFST is only used to rescore the hypotheses generated during the search. Since this rescoring is very efficient, the total amount of computation required by the new method is almost the same as when using only the first WFST. In a 65k-word vocabulary spontaneous lecture speech transcription task, our proposed method significantly outperformed the standard search method. furthermore, our method was faster than decoding with a single fully composed and optimized WFST, where our method used only 38% of the memory required for decoding with the single WFST. Finally, we have achieved high-accuracy one-pass real-time speech recognition with an extremely large vocabulary of 1.8 million words.
引用
收藏
页码:1352 / 1365
页数:14
相关论文
共 25 条
[1]  
[Anonymous], P IEEE INT C AC SPEE
[2]  
[Anonymous], P SSPR 2003
[3]  
BANGALORE S, 2001, P ASRU, P381
[4]  
Casacuberta F, 2001, ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, P375, DOI 10.1109/ASRU.2001.1034664
[5]  
Caseiro D, 2001, ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, P393, DOI 10.1109/ASRU.2001.1034667
[6]  
CASEIRO D, 2003, ICASSP, V1, P356
[7]  
Dolfing HJGA, 2001, ASRU 2001: IEEE WORKSHOP ON AUTOMATIC SPEECH RECOGNITION AND UNDERSTANDING, CONFERENCE PROCEEDINGS, P194, DOI 10.1109/ASRU.2001.1034620
[8]  
HORI C, 2003, P ICASSP2003, V1, P624
[9]  
Hori T, 2003, IEICE T INF SYST, VE86D, P1059
[10]  
HORI T, 2004, P COMMUN SCENE ANAL