ACCELERATING LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION ON HETEROGENEOUS CPU-GPU PLATFORMS

被引：0

作者：

Kim, Jungsuk ^{[1
]}

Lane, Ian ^{[1
]}

机构：

[1] Carnegie Mellon Univ, Pittsburgh, PA 15213 USA

来源：

2014 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) | 2014年

关键词：

Large Vocabulary Continuous Speech Recognition (LVCSR); Weighted Finite State Transducer (WFST); Graphics Processing Units (GPU);

D O I：

暂无

中图分类号：

O42 [声学];

学科分类号：

070206 ; 082403 ;

摘要：

While prior works have demonstrated the effectiveness of Graphic-Processing Units (GPUs) for limited vocabulary speech recognition, these methods were unsuitable for recognition with large language models. To overcome this limitation, previously we introduced a novel "on-the-fly rescoring" approach in which search was performed over a WFST-network composed with a unigram language model on the GPU, and partial hypotheses were rescored on-the-fly using a large language model stored on the CPU. In this paper, we extend our previous algorithm to enable on-the-fly rescoring to be performed over an H-level network composed with any n-gram language model, and show that using a longer language model history in the H-level network improves decoding speed. We demonstrate that large language models can be applied on-the-fly with no degradation in decoding speed, realizing a LVCSR system that performs recognition over 22x faster than a CPU implementation with no loss in recognition accuracy.

引用

页数：5

共 26 条

[1]

[Anonymous], AUDIO SPEECH LANGUAG

[2]

[Anonymous], P 10 ANN C INT SPEEC

[3]

Antoine C. W., 2001, PARALLEL COMPUT, V27, P2000

[4]

Caseiro D., 2002, P INTERSPEECH

[5] A specialized on-the-fly algorithm for lexicon and language, model composition [J].

Caseiro, Diamantino ;

Trancoso, Isabel .

IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (04) :1281-1291

[6] OpenMP: An industry standard API for shared-memory programming [J].

Dagum, L ;

Menon, R .

IEEE COMPUTATIONAL SCIENCE & ENGINEERING, 1998, 5 (01) :46-55

[7]

Dixon P. R., 2009, P ICASSP

[8]

Dixon PR, 2012, INT CONF ACOUST SPEE, P4209, DOI 10.1109/ICASSP.2012.6288847

[9] Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition [J].

Dixon, Paul R. ;

Oonishi, Tasuku ;

Furui, Sadaoki .

COMPUTER SPEECH AND LANGUAGE, 2009, 23 (04) :510-526

[10] Deep Neural Networks for Acoustic Modeling in Speech Recognition [J].

Hinton, Geoffrey ;

Deng, Li ;

Yu, Dong ;

Dahl, George E. ;

Mohamed, Abdel-rahman ;

Jaitly, Navdeep ;

Senior, Andrew ;

Vanhoucke, Vincent ;

Patrick Nguyen ;

Sainath, Tara N. ;

Kingsbury, Brian .

IEEE SIGNAL PROCESSING MAGAZINE, 2012, 29 (06) :82-97

← 1 2 3 →