Exploiting Temporal-Unrolled Parallelism for Energy-Efficient SNN Acceleration

被引:0
作者
Liu, Fangxin [1 ,2 ]
Wang, Zongwu [1 ,2 ]
Zhao, Wenbo [3 ]
Yang, Ning [1 ,2 ]
Chen, Yongbiao [1 ,2 ]
Huang, Shiyuan [1 ,2 ]
Li, Haomin [1 ,2 ]
Yang, Tao [4 ]
Pei, Songwen [5 ]
Liang, Xiaoyao [6 ]
Jiang, Li [1 ,2 ]
机构
[1] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
[2] Shanghai Qi Zhi Inst, Shanghai 200232, Peoples R China
[3] PDD Holdings Inc, Atlanta, GA 30328 USA
[4] Huawei Technol Co Ltd, Shenzhen 518129, Peoples R China
[5] Univ Shanghai Sci & Technol, Comp Sci & Engn Dept, Shanghai 200093, Peoples R China
[6] Shanghai Jiao Tong Univ, Dept Comp Sci & Engn, Shanghai 200240, Peoples R China
基金
中国国家自然科学基金;
关键词
Neurons; Parallel processing; Membrane potentials; Hardware; Energy efficiency; Encoding; Computer architecture; Artificial neural networks; SW/HW co-design; spiking neural networks; NEURAL-NETWORK; BACKPROPAGATION; HARDWARE; SYSTEM;
D O I
10.1109/TPDS.2024.3415712
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Event-driven spiking neural networks (SNNs) have demonstrated significant potential for achieving high energy and area efficiency. However, existing SNN accelerators suffer from issues such as high latency and energy consumption due to serial accumulation-comparison operations. This is mainly because SNN neurons integrate spikes, accumulate membrane potential, and generate output spikes when the potential exceeds a threshold. To address this, one approach is to leverage the sparsity of SNN spikes to reduce the number of time steps. However, this method can result in imbalanced workloads among neurons and limit the utilization of processing elements (PEs). In this paper, we present SATO, a temporal-parallel SNN accelerator that enables parallel accumulation of membrane potential for all time steps. SATO adopts a two-stage pipeline methodology, effectively decoupling neuron computations. This not only maintains accuracy but also unveils opportunities for fine-grained parallelism. By dividing the neuron computation into distinct stages, SATO enables the concurrent execution of spike accumulation for each time step, leveraging the parallel processing capabilities of modern hardware architectures. This not only enhances the overall efficiency of the accelerator but also reduces latency by exploiting parallelism at a granular level. The architecture of SATO includes a novel binary adder-search tree for generating the output spike train, effectively decoupling the chronological dependence in the accumulation-comparison operation. Furthermore, SATO employs a bucket-sort-based method to evenly distribute compressed workloads to all PEs, maximizing data locality of input spike trains. Experimental results on various SNN models demonstrate that SATO outperforms the well-known accelerator, the 8-bit version of "Eyeriss" by 20.7x in terms of speedup and 6.0x energy-saving, on average. Compared to the state-of-the-art SNN accelerator "SpinalFlow", SATO can also achieve 4.6x performance gain and 3.1x energy reduction on average, which is quite impressive for inference.
引用
收藏
页码:1749 / 1764
页数:16
相关论文
共 44 条
  • [1] True North: Design and Tool Flow of a 65 mW 1 Million Neuron Programmable Neurosynaptic Chip
    Akopyan, Filipp
    Sawada, Jun
    Cassidy, Andrew
    Alvarez-Icaza, Rodrigo
    Arthur, John
    Merolla, Paul
    Imam, Nabil
    Nakamura, Yutaka
    Datta, Pallab
    Nam, Gi-Joon
    Taba, Brian
    Beakes, Michael
    Brezzo, Bernard
    Kuang, Jente B.
    Manohar, Rajit
    Risk, William P.
    Jackson, Bryan
    Modha, Dharmendra S.
    [J]. IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, 2015, 34 (10) : 1537 - 1557
  • [2] Neurogrid: A Mixed-Analog-Digital Multichip System for Large-Scale Neural Simulations
    Benjamin, Ben Varkey
    Gao, Peiran
    McQuinn, Emmett
    Choudhary, Swadesh
    Chandrasekaran, Anand R.
    Bussat, Jean-Marie
    Alvarez-Icaza, Rodrigo
    Arthur, John V.
    Merolla, Paul A.
    Boahen, Kwabena
    [J]. PROCEEDINGS OF THE IEEE, 2014, 102 (05) : 699 - 716
  • [3] Bing Han, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12355), P388, DOI 10.1007/978-3-030-58607-2_23
  • [4] Comsa JM, 2020, INT CONF ACOUST SPEE, P8529, DOI [10.1109/icassp40776.2020.9053856, 10.1109/ICASSP40776.2020.9053856]
  • [5] Loihi: A Neuromorphic Manycore Processor with On-Chip Learning
    Davies, Mike
    Srinivasa, Narayan
    Lin, Tsung-Han
    Chinya, Gautham
    Cao, Yongqiang
    Choday, Sri Harsha
    Dimou, Georgios
    Joshi, Prasad
    Imam, Nabil
    Jain, Shweta
    Liao, Yuyun
    Lin, Chit-Kwan
    Lines, Andrew
    Liu, Ruokun
    Mathaikutty, Deepak
    Mccoy, Steve
    Paul, Arnab
    Tse, Jonathan
    Venkataramanan, Guruguhanathan
    Weng, Yi-Hsin
    Wild, Andreas
    Yang, Yoonseok
    Wang, Hong
    [J]. IEEE MICRO, 2018, 38 (01) : 82 - 99
  • [6] Editorial: Understanding and Bridging the Gap Between Neuromorphic Computing and Machine Learning
    Deng, Lei
    Tang, Huajin
    Roy, Kaushik
    [J]. FRONTIERS IN COMPUTATIONAL NEUROSCIENCE, 2021, 15
  • [7] Rethinking the performance comparison between SNNS and ANNS
    Deng, Lei
    Wu, Yujie
    Hu, Xing
    Liang, Ling
    Ding, Yufei
    Li, Guoqi
    Zhao, Guangshe
    Li, Peng
    Xie, Yuan
    [J]. NEURAL NETWORKS, 2020, 121 : 294 - 307
  • [8] Fang W., 2020, SpikingJelly
  • [9] Han G., 2021, P IEEE CVF C COMP VI, P13558
  • [10] Hunsberger E, 2015, Arxiv, DOI arXiv:1510.08829