Hardware-Software Codesign of Automatic Speech Recognition System for Embedded Real-Time Applications

被引:29
作者
Cheng, Octavian [1 ]
Abdulla, Waleed [1 ]
Salcic, Zoran [1 ]
机构
[1] Univ Auckland, Dept Elect & Comp Engn, Auckland 1142, New Zealand
关键词
Automatic speech recognition (ASR); embedded system; hardware-software codesign; real-time system; softcore-based system; DESIGN;
D O I
10.1109/TIE.2009.2022520
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a hardware-software coprocessing speech recognizer for real-time embedded applications. The system consists of a standard microprocessor and a hardware accelerator for Gaussian mixture model (GMM) emission probability calculation implemented on a field-programmable gate array. The GMM accelerator is optimized for timing performance by exploiting data parallelism. In order to avoid large memory requirement, the accelerator adopts a double buffering scheme for accessing the acoustic parameters with no assumption made on the access pattern of these parameters. Experiments on widely used benchmark data show that the real-time factor of the proposed system is 0.62, which is about three times faster than the pure software-based baseline system, while the word accuracy rate is preserved at 93.33%. As a part of the recognizer, a new adaptive beam-pruning algorithm is also proposed and implemented, which further reduces the average real-time factor to 0.54 with the word accuracy rate of 93.16%. The proposed speech recognizer is suitable for integration in various types of voice (speech)-controlled applications.
引用
收藏
页码:850 / 859
页数:10
相关论文
共 27 条
[1]  
*ALT CORP, 2005, NIOS DEV BOARD REF M
[2]  
*ALT CORP, 2006, NIOS 2 PROC REF HDB
[3]  
[Anonymous], 1989, Token passing: A simple conceptual model for connected speech recognition systems
[4]  
[Anonymous], IEEE INT C AC SPEECH
[5]  
Bocchieri E, 2006, INT CONF ACOUST SPEE, P1113
[6]   A particle-swarm-optimized fuzzy-neural network for voice-controlled robot systems [J].
Chatterjee, A ;
Pulasinghe, K ;
Watanabe, K ;
Izumi, K .
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2005, 52 (06) :1478-1489
[7]   COMPARISON OF PARAMETRIC REPRESENTATIONS FOR MONOSYLLABIC WORD RECOGNITION IN CONTINUOUSLY SPOKEN SENTENCES [J].
DAVIS, SB ;
MERMELSTEIN, P .
IEEE TRANSACTIONS ON ACOUSTICS SPEECH AND SIGNAL PROCESSING, 1980, 28 (04) :357-366
[8]   Designing for learnability in human-robot communication [J].
Green, A ;
Eklundh, KS .
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2003, 50 (04) :644-650
[9]   Robust speech dialog interface for car telematics service [J].
Hataoka, N ;
Obuchi, Y ;
Mitamura, T ;
Nyberg, E .
CCNC 2004: 1ST IEEE CONSUMER COMMUNICATIONS AND NETWORKING CONFERENCE, PROCEEDINGS: CONSUMER NETWORKING: CLOSING THE DIGITAL DIVIDE, 2004, :331-335
[10]  
Huggins-Daines D, 2006, INT CONF ACOUST SPEE, P185