Exploiting Temporal-Unrolled Parallelism for Energy-Efficient SNN Acceleration

被引：0

作者：

Liu, Fangxin ^{[1
,2
]}

Wang, Zongwu ^{[1
,2
]}

Zhao, Wenbo ^{[3
]}

Yang, Ning ^{[1
,2
]}

Chen, Yongbiao ^{[1
,2
]}

Huang, Shiyuan ^{[1
,2
]}

Li, Haomin ^{[1
,2
]}

Yang, Tao ^{[4
]}

Pei, Songwen ^{[5
]}

Liang, Xiaoyao ^{[6
]}

Jiang, Li ^{[1
,2
]}

机构：

[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China

[2] Shanghai Qi Zhi Inst, Shanghai 200232, Peoples R China

[3] PDD Holdings Inc, Atlanta, GA 30328 USA

[4] Huawei Technol Co Ltd, Shenzhen 518129, Peoples R China

[5] Univ Shanghai Sci & Technol, Comp Sci & Engn Dept, Shanghai 200093, Peoples R China

[6] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China

来源：

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS | 2024年 / 35卷 / 10期

基金：

中国国家自然科学基金;

关键词：

Neurons; Parallel processing; Membrane potentials; Hardware; Energy efficiency; Encoding; Computer architecture; Artificial neural networks; SW/HW co-design; spiking neural networks; NEURAL-NETWORK; BACKPROPAGATION; HARDWARE; SYSTEM;

D O I：

10.1109/TPDS.2024.3415712

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Event-driven spiking neural networks (SNNs) have demonstrated significant potential for achieving high energy and area efficiency. However, existing SNN accelerators suffer from issues such as high latency and energy consumption due to serial accumulation-comparison operations. This is mainly because SNN neurons integrate spikes, accumulate membrane potential, and generate output spikes when the potential exceeds a threshold. To address this, one approach is to leverage the sparsity of SNN spikes to reduce the number of time steps. However, this method can result in imbalanced workloads among neurons and limit the utilization of processing elements (PEs). In this paper, we present SATO, a temporal-parallel SNN accelerator that enables parallel accumulation of membrane potential for all time steps. SATO adopts a two-stage pipeline methodology, effectively decoupling neuron computations. This not only maintains accuracy but also unveils opportunities for fine-grained parallelism. By dividing the neuron computation into distinct stages, SATO enables the concurrent execution of spike accumulation for each time step, leveraging the parallel processing capabilities of modern hardware architectures. This not only enhances the overall efficiency of the accelerator but also reduces latency by exploiting parallelism at a granular level. The architecture of SATO includes a novel binary adder-search tree for generating the output spike train, effectively decoupling the chronological dependence in the accumulation-comparison operation. Furthermore, SATO employs a bucket-sort-based method to evenly distribute compressed workloads to all PEs, maximizing data locality of input spike trains. Experimental results on various SNN models demonstrate that SATO outperforms the well-known accelerator, the 8-bit version of "Eyeriss" by 20.7x in terms of speedup and 6.0x energy-saving, on average. Compared to the state-of-the-art SNN accelerator "SpinalFlow", SATO can also achieve 4.6x performance gain and 3.1x energy reduction on average, which is quite impressive for inference.

引用

页码：1749 / 1764

页数：16

共 44 条

[1] True North: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip
Akopyan, Filipp
Sawada, Jun
Cassidy, Andrew
Alvarez-Icaza, Rodrigo
Arthur, John
Merolla, Paul
Imam, Nabil
Nakamura, Yutaka
Datta, Pallab
Nam, Gi-Joon
Taba, Brian
Beakes, Michael
Brezzo, Bernard
Kuang, Jente B.
Manohar, Rajit
Risk, William P.
Jackson, Bryan
Modha, Dharmendra S.
[J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2015, 34 (10) : 1537 - 1557
[2] Neurogrid: A Mixed-Analog-Digital Multichip System for Large-Scale Neural Simulations
Benjamin, Ben Varkey
Gao, Peiran
McQuinn, Emmett
Choudhary, Swadesh
Chandrasekaran, Anand R.
Bussat, Jean-Marie
Alvarez-Icaza, Rodrigo
Arthur, John V.
Merolla, Paul A.
Boahen, Kwabena
[J]. PROCEEDINGS OF THE IEEE, 2014, 102 (05) : 699 - 716
[3] Bing Han, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12355), P388, DOI 10.1007/978-3-030-58607-2_23
[4] Comsa JM, 2020, INT CONF ACOUST SPEE, P8529, DOI [10.1109/icassp40776.2020.9053856, 10.1109/ICASSP40776.2020.9053856]
[5] Loihi: A Neuromorphic Manycore Processor with On-Chip Learning
Davies, Mike
Srinivasa, Narayan
Lin, Tsung-Han
Chinya, Gautham
Cao, Yongqiang
Choday, Sri Harsha
Dimou, Georgios
Joshi, Prasad
Imam, Nabil
Jain, Shweta
Liao, Yuyun
Lin, Chit-Kwan
Lines, Andrew
Liu, Ruokun
Mathaikutty, Deepak
Mccoy, Steve
Paul, Arnab
Tse, Jonathan
Venkataramanan, Guruguhanathan
Weng, Yi-Hsin
Wild, Andreas
Yang, Yoonseok
Wang, Hong
[J]. IEEE MICRO, 2018, 38 (01) : 82 - 99
[6] Editorial: Understanding and Bridging the Gap Between Neuromorphic Computing and Machine Learning
Deng, Lei
Tang, Huajin
Roy, Kaushik
[J]. FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2021, 15
[7] Rethinking the performance comparison between SNNS and ANNS
Deng, Lei
Wu, Yujie
Hu, Xing
Liang, Ling
Ding, Yufei
Li, Guoqi
Zhao, Guangshe
Li, Peng
Xie, Yuan
[J]. NEURAL NETWORKS, 2020, 121 : 294 - 307
[8] Fang W., 2020, SpikingJelly
[9] Han G., 2021, P IEEE CVF C COMP VI, P13558
[10] Hunsberger E, 2015, Arxiv, DOI arXiv:1510.08829

← 1 2 3 4 5 →