ACCELERATING LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION ON HETEROGENEOUS CPU-GPU PLATFORMS

被引:0
作者
Kim, Jungsuk [1 ]
Lane, Ian [1 ]
机构
[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA
来源
2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年
关键词
Large Vocabulary Continuous Speech Recognition (LVCSR); Weighted Finite State Transducer (WFST); Graphics Processing Units (GPU);
D O I
暂无
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
While prior works have demonstrated the effectiveness of Graphic-Processing Units (GPUs) for limited vocabulary speech recognition, these methods were unsuitable for recognition with large language models. To overcome this limitation, previously we introduced a novel "on-the-fly rescoring" approach in which search was performed over a WFST-network composed with a unigram language model on the GPU, and partial hypotheses were rescored on-the-fly using a large language model stored on the CPU. In this paper, we extend our previous algorithm to enable on-the-fly rescoring to be performed over an H-level network composed with any n-gram language model, and show that using a longer language model history in the H-level network improves decoding speed. We demonstrate that large language models can be applied on-the-fly with no degradation in decoding speed, realizing a LVCSR system that performs recognition over 22x faster than a CPU implementation with no loss in recognition accuracy.
引用
收藏
页数:5
相关论文
共 26 条
[1]  
[Anonymous], AUDIO SPEECH LANGUAG
[2]  
[Anonymous], P 10 ANN C INT SPEEC
[3]  
Antoine C. W., 2001, PARALLEL COMPUT, V27, P2000
[4]  
Caseiro D., 2002, P INTERSPEECH
[5]   A specialized on-the-fly algorithm for lexicon and language, model composition [J].
Caseiro, Diamantino ;
Trancoso, Isabel .
IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04) :1281-1291
[6]   OpenMP: An industry standard API for shared-memory programming [J].
Dagum, L ;
Menon, R .
IEEE COMPUTATIONAL SCIENCE & ENGINEERING, 1998, 5 (01) :46-55
[7]  
Dixon P. R., 2009, P ICASSP
[8]  
Dixon PR, 2012, INT CONF ACOUST SPEE, P4209, DOI 10.1109/ICASSP.2012.6288847
[9]   Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition [J].
Dixon, Paul R. ;
Oonishi, Tasuku ;
Furui, Sadaoki .
COMPUTER SPEECH AND LANGUAGE, 2009, 23 (04) :510-526
[10]   Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].
Hinton, Geoffrey ;
Deng, Li ;
Yu, Dong ;
Dahl, George E. ;
Mohamed, Abdel-rahman ;
Jaitly, Navdeep ;
Senior, Andrew ;
Vanhoucke, Vincent ;
Patrick Nguyen ;
Sainath, Tara N. ;
Kingsbury, Brian .
IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97