Parallel implementation of Artificial Neural Network training for speech recognition

被引：27

作者：

Scanzio, Stefano ^{[1
]}

Cumani, Sandro ^{[1
]}

Gemello, Roberto ^{[2
]}

Mana, Franco ^{[2
]}

Laface, P. ^{[1
]}

机构：

[1] Politecn Torino, Dipartimento Automat & Informat, I-10129 Turin, Italy

[2] Loquendo SpA, I-10148 Turin, Italy

来源：

PATTERN RECOGNITION LETTERS | 2010年 / 31卷 / 11期

关键词：

Artificial Neural Network; Block Back-propagation; Focused Attention Back-Propagation; GPU; CUDA; Fast Training;

D O I：

10.1016/j.patrec.2010.02.003

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper we describe the implementation of a complete ANN training procedure using the block mode back-propagation learning algorithm for sequential patterns - such as the observation feature vectors of a speech recognition system - exploiting the high performance SIMD architecture of CPU using CUDA and its C-like language interface. We also compare the speed-up obtained implementing the training procedure only taking advantage of the multi-thread capabilities of multi-core processors. In our implementation we take into account all the peculiar aspects of training large scale sequential patterns, in particular, the re-segmentation of the training sentences, the block size for the feed-forward and for the back-propagation steps, and the transfer of huge amount of data from host memory to the CPU card. Our approach has been tested by training acoustic models for large vocabulary speech recognition tasks, showing a six times reduction of the time required to train real-world large size networks with respect to an already optimized implementation using the Intel MKL libraries. Thanks to these optimizations and to the support of the CPU, the training time for language having a huge set of training sentences (about one million for Italian) can be reduced from approximately a month to 5 days. (C) 2010 Elsevier B.V. All rights reserved.

引用

页码：1302 / 1309

页数：8

共 22 条

[1]

Albesano D., 1997, P NEUR INF PROC, P1112

[2] AN EFFICIENT IMPLEMENTATION OF BP ON RISC-BASED WORKSTATIONS [J].

ANGUITA, D ;

PARODI, G ;

ZUNINO, R .

NEUROCOMPUTING, 1994, 6 (01) :57-65

[3]

BILMES J, 1997, P INT C AC SPEECH SI, P4153

[4] An updated set of Basic Linear Algebra Subprograms (BLAS) [J].

Blackford, LS ;

Demmel, J ;

Dongarra, J ;

Duff, I ;

Hammarling, S ;

Henry, G ;

Heroux, M ;

Kaufman, L ;

Lumsdaine, A ;

Petitet, A ;

Pozo, R ;

Remington, K ;

Whaley, RC .

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 2002, 28 (02) :135-151

[5]

Bourlard H.A., 1993, Connectionist Speech Recognition: A Hybrid Approach, DOI 10.1007/978-1-4615-3210-1

[6]

Cardinal P., 2009, Proceedings of 10th Annual Conference of the International Speech Communication Association (Interspeech), P3039

[7]

Cernansky M, 2009, LECT NOTES COMPUT SC, V5768, P381

[8]

Chan Arthur., 2004, P INTERSPEECH 2004, P689

[9] Harnessing graphics processors for the fast computation of acoustic likelihoods in speech recognition [J].

Dixon, Paul R. ;

Oonishi, Tasuku ;

Furui, Sadaoki .

COMPUTER SPEECH AND LANGUAGE, 2009, 23 (04) :510-526

[10]

FISSORE L, 1995, P EUROSPEECH 95 MADR, P799

← 1 2 3 →