Stochastic Spiking Attention: Accelerating Attention with Stochastic Computing in Spiking Networks

被引:0
作者
Song, Zihang [1 ]
Katti, Prabodh [1 ]
Simeone, Osvaldo [1 ]
Rajendran, Bipin [1 ]
机构
[1] Kings Coll London, Dept Engn, London WC2R 2LS, England
来源
2024 IEEE 6TH INTERNATIONAL CONFERENCE ON AI CIRCUITS AND SYSTEMS, AICAS 2024 | 2024年
基金
英国工程与自然科学研究理事会;
关键词
Spiking neural network; Transformer; attention; stochastic computing; hardware accelerator; NEURAL-NETWORKS; OPTIMIZATION;
D O I
10.1109/AICAS59952.2024.10595893
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Spiking Neural Networks (SNNs) have been recently integrated into Transformer architectures due to their potential to reduce computational demands and to improve power efficiency. Yet, the implementation of the attention mechanism using spiking signals on general-purpose computing platforms remains inefficient. In this paper, we propose a novel framework leveraging stochastic computing (SC) to effectively execute the dot-product attention for SNN-based Transformers. We demonstrate that our approach can achieve high classification accuracy (83.53%) on CIFAR-10 within 10 time steps, which is comparable to the performance of a baseline artificial neural network implementation (83.66%). We estimate that the proposed SC approach can lead to over 6.3x reduction in computing energy and 1.7x reduction in memory access costs for a digital CMOS-based ASIC design. We experimentally validate our stochastic attention block design through an FPGA implementation, which is shown to achieve 48x lower latency as compared to a GPU implementation, while consuming 15x less power.
引用
收藏
页码:31 / 35
页数:5
相关论文
共 32 条
  • [1] Survey of Stochastic Computing
    Alaghi, Armin
    Hayes, John P.
    [J]. ACM TRANSACTIONS ON EMBEDDED COMPUTING SYSTEMS, 2013, 12
  • [2] Bal M, 2024, Arxiv, DOI arXiv:2308.10873
  • [3] Brown TB, 2020, ADV NEUR IN, V33
  • [4] Buffa C., 2021, US Patent, Patent No. [10,886,930, 10886930]
  • [5] A discrete time neural network model with spiking neurons: II: Dynamics with noise
    Cessac, B.
    [J]. JOURNAL OF MATHEMATICAL BIOLOGY, 2011, 62 (06) : 863 - 900
  • [6] ACE-SNN: Algorithm-Hardware Co-design of Energy-Efficient & Low-Latency Deep Spiking Neural Networks for 3D Image Recognition
    Datta, Gourav
    Kundu, Souvik
    Jaiswal, Akhilesh R.
    Beerel, Peter A.
    [J]. FRONTIERS IN NEUROSCIENCE, 2022, 16
  • [7] Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, DOI 10.48550/ARXIV.1810.04805]
  • [8] Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929
  • [9] Gaines B. R., 1967, P INT FED AUT CONTR
  • [10] A3: Accelerating Attention Mechanisms in Neural Networks with Approximation
    Ham, Tae Jun
    Jung, Sung Jun
    Kim, Seonghak
    Oh, Young H.
    Park, Yeonhong
    Song, Yoonho
    Park, Jung-Hun
    Lee, Sanghee
    Park, Kyoung
    Lee, Jae W.
    Jeong, Deog-Kyoon
    [J]. 2020 IEEE INTERNATIONAL SYMPOSIUM ON HIGH PERFORMANCE COMPUTER ARCHITECTURE (HPCA 2020), 2020, : 328 - 341