SpikingViT: A Multiscale Spiking Vision Transformer Model for Event-Based ObjectDetection

被引:1
作者
Yu, Lixing [1 ]
Chen, Hanqi [1 ]
Wang, Ziming [2 ]
Zhan, Shaojie [1 ]
Shao, Jiankun [3 ]
Liu, Qingjie [4 ]
Xu, Shu [4 ]
机构
[1] Yunnan Univ, Sch Informat Sci & Engn, Kunming 650500, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
[3] Beijing Inst Technol, State Key Lab Explos Sci & Technol, Beijing 100081, Peoples R China
[4] China Nanhu Acad Elect & Informat Technol, Jiaxing 314002, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Transformers; Cameras; Feature extraction; Data mining; Voltage control; Task analysis; DVS data converting; object detection; residual voltage memory; spiking transformer; OBJECT DETECTION; NETWORKS;
D O I
10.1109/TCDS.2024.3422873
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Event cameras have unique advantages in object detection, capturing asynchronous events without continuous frames. They excel in dynamic range, low latency, and high-speed motion scenarios, with lower power consumption. However, aggregating event data into image frames leads to information loss and reduced detection performance. Applying traditional neural networks to event camera outputs is challenging due to event data's distinct characteristics. In this study, we present a novel spiking neural networks (SNNs)-based object detection model, the spiking vision transformer (SpikingViT) to address these issues. First, we design a dedicated event data converting module that effectively captures the unique characteristics of event data, mitigating the risk of information loss while preserving its spatiotemporal features. Second, we introduce SpikingViT, a novel object detection model that leverages SNNs capable of extracting spatiotemporal information among events data. SpikingViT combines the advantages of SNNs and transformer models, incorporating mechanisms such as attention and residual voltage memory to further enhance detection performance. Extensive experiments have substantiated the remarkable proficiency of SpikingViT in event-based object detection, positioning it as a formidable contender. Our proposed approach adeptly retains spatiotemporal information inherent in event data, leading to a substantial enhancement in detection performance.
引用
收藏
页码:130 / 146
页数:17
相关论文
共 63 条
  • [11] Event-Based Vision: A Survey
    Gallego, Guillermo
    Delbruck, Tobi
    Orchard, Garrick Michael
    Bartolozzi, Chiara
    Taba, Brian
    Censi, Andrea
    Leutenegger, Stefan
    Davison, Andrew
    Conradt, Jorg
    Daniilidis, Kostas
    Scaramuzza, Davide
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (01) : 154 - 180
  • [12] Ge Z., 2021, arXiv, DOI arXiv:2107.08430
  • [13] End-to-End Learning of Representations for Asynchronous Event-Based Data
    Gehrig, Daniel
    Loquercio, Antonio
    Derpanis, Konstantinos G.
    Scaramuzza, Davide
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5632 - 5642
  • [14] Recurrent Vision Transformers for Object Detection with Event Cameras
    Gehrig, Mathias
    Scaramuzza, Davide
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13884 - 13893
  • [15] DSEC: A Stereo Event Camera Dataset for Driving Scenarios
    Gehrig, Mathias
    Aarents, Willem
    Gehrig, Daniel
    Scaramuzza, Davide
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (03) : 4947 - 4954
  • [16] Region-Based Convolutional Networks for Accurate Object Detection and Segmentation
    Girshick, Ross
    Donahue, Jeff
    Darrell, Trevor
    Malik, Jitendra
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (01) : 142 - 158
  • [17] Gu PJ, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1366
  • [18] Transformer-Based Spiking Neural Networks for Multimodal Audiovisual Classification
    Guo, Lingyue
    Gao, Zeyu
    Qu, Jinye
    Zheng, Suiwu
    Jiang, Runhao
    Lu, Yanfeng
    Qiao, Hong
    [J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (03) : 1077 - 1086
  • [19] Hasani R, 2020, Arxiv, DOI [arXiv:2006.04439, 10.1609/aaai.v35i9.16936, DOI 10.1609/AAAI.V35I9.16936]
  • [20] Hasssan A., 2023, LT-SNN: Self-adaptive spiking neural network for event-based classification and object detection