EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference

被引:29
作者
Gao, Chang [1 ,2 ]
Rios-Navarro, Antonio [3 ]
Chen, Xi [1 ,2 ]
Liu, Shih-Chii [1 ,2 ]
Delbruck, Tobi [1 ,2 ]
机构
[1] Univ Zurich, Inst Neuroinformat, CH-8057 Zurich, Switzerland
[2] Swiss Fed Inst Technol, CH-8057 Zurich, Switzerland
[3] Univ Seville, Robot & Technol Comp Lab, Seville 41012, Spain
基金
瑞士国家科学基金会;
关键词
Recurrent neural networks; Field programmable gate arrays; Memory management; Embedded systems; Hardware; Edge computing; Deep learning; FPGA; embedded system; deep learning; RNN; GRU; delta network; SYSTEMS;
D O I
10.1109/JETCAS.2020.3040300
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Low-latency, low-power portable recurrent neural network (RNN) accelerators offer powerful inference capabilities for real-time applications such as IoT, robotics, and human-machine interaction. We propose a lightweight Gated Recurrent Unit (GRU)-based RNN accelerator called EdgeDRNN that is optimized for low-latency edge RNN inference with batch size of 1. EdgeDRNN adopts the spiking neural network inspired delta network algorithm to exploit temporal sparsity in RNNs. Weights are stored in inexpensive DRAM which enables EdgeDRNN to compute large multi-layer RNNs on the most inexpensive FPGA. The sparse updates reduce DRAM weight memory access by a factor of up to 10x and the delta can be varied dynamically to trade-off between latency and accuracy. EdgeDRNN updates a 5 million parameter 2-layer GRU-RNN in about 0.5ms. It achieves latency comparable with a 92W Nvidia 1080 GPU. It outperforms NVIDIA Jetson Nano, Jetson TX2 and Intel Neural Compute Stick 2 in latency by 5X. For a batch size of 1, EdgeDRNN achieves a mean effective throughput of 20.2GOp/s and a wall plug power efficiency that is over 4X higher than the commercial edge AI platforms.
引用
收藏
页码:419 / 432
页数:14
相关论文
共 36 条
[1]   True North: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip [J].
Akopyan, Filipp ;
Sawada, Jun ;
Cassidy, Andrew ;
Alvarez-Icaza, Rodrigo ;
Arthur, John ;
Merolla, Paul ;
Imam, Nabil ;
Nakamura, Yutaka ;
Datta, Pallab ;
Nam, Gi-Joon ;
Taba, Brian ;
Beakes, Michael ;
Brezzo, Bernard ;
Kuang, Jente B. ;
Manohar, Rajit ;
Risk, William P. ;
Jackson, Bryan ;
Modha, Dharmendra S. .
IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2015, 34 (10) :1537-1557
[2]  
[Anonymous], 2014, C EMPIRICAL METHODS, DOI 10.3115/v1/d14-1179.
[3]  
[Anonymous], 1993, Tidigits speech corpus
[4]  
[Anonymous], 2019, Jetson nano Developer Kit
[5]  
Bilaniuk O., 2019, IEEE INT SYMP CIRC S, P1
[6]   Multivariate estimation of the limit of detection by orthogonal partial least squares in temperature-modulated MOX sensors [J].
Burgues, Javier ;
Marco, Santiago .
ANALYTICA CHIMICA ACTA, 2018, 1019 :49-64
[7]   Estimation of the limit of detection in semiconductor gas sensors through linearized calibration models [J].
Burgues, Javier ;
Manuel Jimenez-Soto, Juan ;
Marco, Santiago .
ANALYTICA CHIMICA ACTA, 2018, 1013 :13-25
[8]   Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity [J].
Cao, Shijie ;
Zhang, Chen ;
Yao, Zhuliang ;
Xiao, Wencong ;
Nie, Lanshun ;
Zhan, Dechen ;
Liu, Yunxin ;
Wu, Ming ;
Zhang, Lintao .
PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, :63-72
[9]  
Chanana Ashish, 2017, 2017 42nd International Conference on Infrared, Millimeter and Terahertz Waves (IRMMW-THz), DOI 10.1109/IRMMW-THz.2017.8067214
[10]   Deep Learning With Edge Computing: A Review [J].
Chen, Jiasi ;
Ran, Xukan .
PROCEEDINGS OF THE IEEE, 2019, 107 (08) :1655-1674