A 8.93-TOPS/W LSTM Recurrent Neural Network Accelerator Featuring Hierarchical Coarse-Grain Sparsity With All Parameters Stored On-Chip

被引:19
作者
Kadetotad, Deepak [1 ]
Berisha, Visar [1 ]
Chakrabarti, Chaitali [1 ]
Seo, Jae-Sun [1 ]
机构
[1] Arizona State Univ, Sch Elect Comp & Energy Engn, Tempe, AZ 85281 USA
来源
IEEE SOLID-STATE CIRCUITS LETTERS | 2019年 / 2卷 / 09期
关键词
Hardware accelerator; long short-term memory (LSTM); speech recognition; structured sparsity weight compression;
D O I
10.1109/LSSC.2019.2936761
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Long short-term memory (LSTM) networks are widely used for speech applications but pose difficulties for efficient implementation on hardware due to large weight storage requirements. We present an energy-efficient LSTM recurrent neural network (RNN) accelerator, featuring an algorithm-hardware co-optimized memory compression technique called hierarchical coarse-grain sparsity (HCGS). Aided by HCGS-based block-wise recursive weight compression, we demonstrate LSTM networks with up to 16x fewer weights while achieving minimal accuracy loss. The prototype chip fabricated in 65-nm LP CMOS achieves 8.93/7.22 TOPS/W for 2-/3-layer LSTM RNNs trained with HCGS for TIMIT/TED-LIUM corpora.
引用
收藏
页码:119 / 122
页数:4
相关论文
共 9 条
[1]  
Conti F, 2018, IEEE CUST INTEGR CIR, DOI 10.1109/ICOPS35962.2018.9575761
[2]   Convolutional networks for fast, energy-efficient neuromorphic computing [J].
Esser, Steven K. ;
Merolla, Paul A. ;
Arthur, John V. ;
Cassidy, Andrew S. ;
Appuswamy, Rathinakumar ;
Andreopoulos, Alexander ;
Berg, David J. ;
McKinstry, Jeffrey L. ;
Melano, Timothy ;
Barch, Davis R. ;
di Nolfo, Carmelo ;
Datta, Pallab ;
Amir, Arnon ;
Taba, Brian ;
Flickner, Myron D. ;
Modha, Dharmendra S. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2016, 113 (41) :11441-11446
[3]  
Giraldo JSP, 2018, PROC EUR SOLID-STATE, P166, DOI 10.1109/ESSCIRC.2018.8494342
[4]   ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA [J].
Han, Song ;
Kang, Junlong ;
Mao, Huizi ;
Hu, Yiming ;
Li, Xin ;
Li, Yubin ;
Xie, Dongliang ;
Luo, Hong ;
Yao, Song ;
Wang, Yu ;
Yang, Huazhong ;
Dally, William J. .
FPGA'17: PROCEEDINGS OF THE 2017 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS, 2017, :75-84
[5]   Efficient Memory Compression in Deep Neural Networks Using Coarse-Grain Sparsification for Speech Applications [J].
Kadetotad, Deepak ;
Arunachalam, Sairam ;
Chakrabarti, Chaitali ;
Seo, Jae-sun .
2016 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN (ICCAD), 2016,
[6]   C-LSTM: Enabling Efficient LSTM using Structured Compression Techniques on FPGAs [J].
Wang, Shuo ;
Li, Zhe ;
Ding, Caiwen ;
Yuan, Bo ;
Qiu, Qinru ;
Wang, Yanzhi ;
Liang, Yun .
PROCEEDINGS OF THE 2018 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'18), 2018, :11-20
[7]  
Wen W., 2018, INT C LEARN REPR
[8]  
Xiong W, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P5934, DOI 10.1109/ICASSP.2018.8461870
[9]  
Yin SY, 2017, SYMP VLSI CIRCUITS, pC26, DOI 10.23919/VLSIC.2017.8008534