A Low-Power, High-Performance Speech Recognition Accelerator

被引:9
|
作者
Yazdani, Reza [1 ]
Arnau, Jose-Maria [1 ]
Gonzalez, Antonio [1 ]
机构
[1] Univ Politecn Cataluna, Dept Comp Architecture, ES-08034 Barcelona, Spain
基金
欧盟地平线“2020”;
关键词
Viterbi algorithm; Speech recognition; Graphics processing units; Acoustics; Central Processing Unit; Hardware; Decoding; Automatic Speech Recognition (ASR); Viterbi search; hardware accelerator; WFST; low-power architecture;
D O I
10.1109/TC.2019.2937075
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Automatic Speech Recognition (ASR) is becoming increasingly ubiquitous, especially in the mobile segment. Fast and accurate ASR comes at high energy cost, not being affordable for the tiny power-budgeted mobile devices. Hardware acceleration reduces energy-consumption of ASR systems, while delivering high-performance. In this paper, we present an accelerator for large-vocabulary, speaker-independent, continuous speech-recognition. It focuses on the Viterbi search algorithm representing the main bottleneck in an ASR system. The proposed design consists of innovative techniques to improve the memory subsystem, since memory is the main bottleneck for performance and power in these accelerators' design. It includes a prefetching scheme tailored to the needs of ASR systems that hides main memory latency for a large fraction of the memory accesses, negligibly impacting area. Additionally, we introduce a novel bandwidth-saving technique that removes off-chip memory accesses by 20 percent. Finally, we present a power saving technique that significantly reduces the leakage power of the accelerators scratchpad memories, providing between 8.5 and 29.2 percent reduction in entire power dissipation. Overall, the proposed design outperforms implementations running on the CPU by orders of magnitude, and achieves speedups between 1.7x and 5.9x for different speech decoders over a highly optimized CUDA implementation running on Geforce-GTX-980 GPU, while reducing the energy by 123-454x.
引用
收藏
页码:1817 / 1831
页数:15
相关论文
共 50 条
  • [1] An Ultra Low-Power Hardware Accelerator for Automatic Speech Recognition
    Yazdani, Reza
    Segura, Albert
    Arnau, Jose-Maria
    Gonzalez, Antonio
    2016 49TH ANNUAL IEEE/ACM INTERNATIONAL SYMPOSIUM ON MICROARCHITECTURE (MICRO), 2016,
  • [2] The Accelerator Store framework for high-performance, low-power accelerator-based systems
    Lyons, Michael J.
    Hempstead, Mark
    Wei, Gu-Yeon
    Brooks, David
    IEEE COMPUTER ARCHITECTURE LETTERS, 2010, 9 (02) : 53 - 56
  • [3] An Ultra Low-power Hardware Accelerator for Acoustic Scoring in Speech Recognition
    Tabani, Hamid
    Arnau, Jose-Maria
    Tubella, Jordi
    Gonzalez, Antonio
    2017 26TH INTERNATIONAL CONFERENCE ON PARALLEL ARCHITECTURES AND COMPILATION TECHNIQUES (PACT), 2017, : 41 - 52
  • [4] High-performance low-power FFT cores
    Han, Wei
    Erdogan, Ahmet T.
    Arslan, Tughrul
    Hasan, Mohd.
    ETRI JOURNAL, 2008, 30 (03) : 451 - 460
  • [5] A RISC-V in-network accelerator for flexible high-performance low-power packet processing
    Di Girolamo, Salvatore
    Kurth, Andreas
    Calotoiu, Alexandru
    Benz, Thomas
    Schneider, Timo
    Beranek, Jakub
    Benini, Luca
    Hoefler, Torsten
    2021 ACM/IEEE 48TH ANNUAL INTERNATIONAL SYMPOSIUM ON COMPUTER ARCHITECTURE (ISCA 2021), 2021, : 958 - 971
  • [6] A High-Performance Low-Power H.264/AVC Video Decoder Accelerator for Embedded Systems
    Kuo, Huang-Chih
    Chen, Jian-Wen
    Lin, Youn-Long
    2009 IEEE/ACM/IFIP 7TH WORKSHOP ON EMBEDDED SYSTEMS FOR REAL-TIME MULTIMEDIA, 2009, : 1 - 8
  • [7] Design of high-performance low-power full adder
    Nehru, K.
    Shanmugam, A.
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2014, 49 (02) : 134 - 140
  • [8] A HIGH-PERFORMANCE LOW-POWER CMOS CHANNEL FILTER
    BLACK, WC
    ALLSTOT, DJ
    REED, RA
    IEEE JOURNAL OF SOLID-STATE CIRCUITS, 1980, 15 (06) : 929 - 938
  • [9] Low-power high-performance FinFET sequential circuits
    Tawfik, Shenf A.
    Kursuri, Volkan
    20TH ANNIVERSARY IEEE INTERNATIONAL SOC CONFERENCE, PROCEEDINGS, 2007, : 145 - 148
  • [10] A low-power high-performance embedded SRAM macrocell
    Fahim, AM
    Khellah, M
    Elmasry, MI
    PROCEEDINGS OF THE 8TH GREAT LAKES SYMPOSIUM ON VLSI, 1998, : 13 - 18