Spartus: A 9.4 TOp/s FPGA-Based LSTM Accelerator Exploiting Spatio-Temporal Sparsity

被引：23

作者：

Gao, Chang ^{[1
,2
]}

Delbruck, Tobi ^{[1
,2
]}

Liu, Shih-Chii ^{[1
,2
]}

机构：

[1] Univ Zurich, Inst Neuroinformat, Sensors Grp, CH-8057 Zurich, Switzerland

[2] Swiss Fed Inst Technol, CH-8057 Zurich, Switzerland

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2024年 / 35卷 / 01期

关键词：

Delta network; dropout; edge computing; recurrent neural network (RNN); spiking neural network; structured pruning; NEURAL-NETWORK ACCELERATOR;

D O I：

10.1109/TNNLS.2022.3180209

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Long short-term memory (LSTM) recurrent networks are frequently used for tasks involving time-sequential data, such as speech recognition. Unlike previous LSTM accelerators that either exploit spatial weight sparsity or temporal activation sparsity, this article proposes a new accelerator called "Spartus" that exploits spatio-temporal sparsity to achieve ultralow latency inference. Spatial sparsity is induced using a new column-balanced targeted dropout (CBTD) structured pruning method, producing structured sparse weight matrices for a balanced workload. The pruned networks running on Spartus hardware achieve weight sparsity levels of up to 96% and 94% with negligible accuracy loss on the TIMIT and the Librispeech datasets. To induce temporal sparsity in LSTM, we extend the previous DeltaGRU method to the DeltaLSTM method. Combining spatio-temporal sparsity with CBTD and DeltaLSTM saves on weight memory access and associated arithmetic operations. The Spartus architecture is scalable and supports real-time online speech recognition when implemented on small and large FPGAs. Spartus per-sample latency for a single DeltaLSTM layer of 1024 neurons averages 1 mu s . Exploiting spatio-temporal sparsity on our test LSTM network using the TIMIT dataset leads to 46x speedup of Spartus over its theoretical hardware performance to achieve 9.4-TOp/s effective batch-1 throughput and 1.1-TOp/s/W power efficiency.

引用

页码：1098 / 1112

页数：15

共 44 条

[1] NullHop: A Flexible Convolutional Neural Network Accelerator Based on Sparse Representations of Feature Maps [J].

Aimar, Alessandro ;

Mostafa, Hesham ;

Calabrese, Enrico ;

Rios-Navarro, Antonio ;

Tapiador-Morales, Ricardo ;

Lungu, Iulia-Alexandra ;

Milde, Moritz B. ;

Corradi, Federico ;

Linares-Barranco, Alejandro ;

Liu, Shih-Chii ;

Delbruck, Tobi .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (03) :644-656

[2]

Amodei D, 2016, PR MACH LEARN RES, V48

[3] Efficient and Effective Sparse LSTM on FPGA with Bank-Balanced Sparsity [J].

Cao, Shijie ;

Zhang, Chen ;

Yao, Zhuliang ;

Xiao, Wencong ;

Nie, Lanshun ;

Zhan, Dechen ;

Liu, Yunxin ;

Wu, Ming ;

Zhang, Lintao .

PROCEEDINGS OF THE 2019 ACM/SIGDA INTERNATIONAL SYMPOSIUM ON FIELD-PROGRAMMABLE GATE ARRAYS (FPGA'19), 2019, :63-72

[4] An Efficient Accelerator for Multiple Convolutions From the Sparsity Perspective [J].

Chen, Qinyu ;

Huang, Yan ;

Sun, Rui ;

Song, Wenqing ;

Lu, Zhonghai ;

Fu, Yuxiang ;

Li, Li .

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, 2020, 28 (06) :1540-1544

[5] SPARSE-MATRIX TEST PROBLEMS [J].

DUFF, IS ;

GRIMES, RG ;

LEWIS, JG .

ACM TRANSACTIONS ON MATHEMATICAL SOFTWARE, 1989, 15 (01) :1-14

[6]

Frankle J., 2019, INT C LEARN REPR

[7] Maximum likelihood linear transformations for HMM-based speech recognition [J].

Gales, MJF .

COMPUTER SPEECH AND LANGUAGE, 1998, 12 (02) :75-98

[8]

Gao C., 2019, P IEEE INT S CIRC SY, P1

[9] EdgeDRNN: Recurrent Neural Network Accelerator for Edge Inference [J].

Gao, Chang ;

Rios-Navarro, Antonio ;

Chen, Xi ;

Liu, Shih-Chii ;

Delbruck, Tobi .

IEEE JOURNAL ON EMERGING AND SELECTED TOPICS IN CIRCUITS AND SYSTEMS, 2020, 10 (04) :419-432

[10]

Gao C, 2020, IEEE INT CONF ROBOT, P5460, DOI [10.1109/icra40945.2020.9196984, 10.1109/ICRA40945.2020.9196984]

← 1 2 3 4 5 →