SpikingViT: A Multiscale Spiking Vision Transformer Model for Event-Based ObjectDetection

被引：1

作者：

Yu, Lixing ^{[1
]}

Chen, Hanqi ^{[1
]}

Wang, Ziming ^{[2
]}

Zhan, Shaojie ^{[1
]}

Shao, Jiankun ^{[3
]}

Liu, Qingjie ^{[4
]}

Xu, Shu ^{[4
]}

机构：

[1] Yunnan Univ, Sch Informat Sci & Engn, Kunming 650500, Peoples R China

[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China

[3] Beijing Inst Technol, State Key Lab Explos Sci & Technol, Beijing 100081, Peoples R China

[4] China Nanhu Acad Elect & Informat Technol, Jiaxing 314002, Peoples R China

来源：

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS | 2025年 / 17卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Object detection; Transformers; Cameras; Feature extraction; Data mining; Voltage control; Task analysis; DVS data converting; object detection; residual voltage memory; spiking transformer; OBJECT DETECTION; NETWORKS;

D O I：

10.1109/TCDS.2024.3422873

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Event cameras have unique advantages in object detection, capturing asynchronous events without continuous frames. They excel in dynamic range, low latency, and high-speed motion scenarios, with lower power consumption. However, aggregating event data into image frames leads to information loss and reduced detection performance. Applying traditional neural networks to event camera outputs is challenging due to event data's distinct characteristics. In this study, we present a novel spiking neural networks (SNNs)-based object detection model, the spiking vision transformer (SpikingViT) to address these issues. First, we design a dedicated event data converting module that effectively captures the unique characteristics of event data, mitigating the risk of information loss while preserving its spatiotemporal features. Second, we introduce SpikingViT, a novel object detection model that leverages SNNs capable of extracting spatiotemporal information among events data. SpikingViT combines the advantages of SNNs and transformer models, incorporating mechanisms such as attention and residual voltage memory to further enhance detection performance. Extensive experiments have substantiated the remarkable proficiency of SpikingViT in event-based object detection, positioning it as a formidable contender. Our proposed approach adeptly retains spatiotemporal information inherent in event data, leading to a substantial enhancement in detection performance.

引用

页码：130 / 146

页数：17

共 63 条

[11] Event-Based Vision: A Survey
Gallego, Guillermo
Delbruck, Tobi
Orchard, Garrick Michael
Bartolozzi, Chiara
Taba, Brian
Censi, Andrea
Leutenegger, Stefan
Davison, Andrew
Conradt, Jorg
Daniilidis, Kostas
Scaramuzza, Davide
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2022, 44 (01) : 154 - 180
[12] Ge Z., 2021, arXiv, DOI arXiv:2107.08430
[13] End-to-End Learning of Representations for Asynchronous Event-Based Data
Gehrig, Daniel
Loquercio, Antonio
Derpanis, Konstantinos G.
Scaramuzza, Davide
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 5632 - 5642
[14] Recurrent Vision Transformers for Object Detection with Event Cameras
Gehrig, Mathias
Scaramuzza, Davide
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13884 - 13893
[15] DSEC: A Stereo Event Camera Dataset for Driving Scenarios
Gehrig, Mathias
Aarents, Willem
Gehrig, Daniel
Scaramuzza, Davide
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (03) : 4947 - 4954
[16] Region-Based Convolutional Networks for Accurate Object Detection and Segmentation
Girshick, Ross
Donahue, Jeff
Darrell, Trevor
Malik, Jitendra
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016, 38 (01) : 142 - 158
[17] Gu PJ, 2019, PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1366
[18] Transformer-Based Spiking Neural Networks for Multimodal Audiovisual Classification
Guo, Lingyue
Gao, Zeyu
Qu, Jinye
Zheng, Suiwu
Jiang, Runhao
Lu, Yanfeng
Qiao, Hong
[J]. IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2024, 16 (03) : 1077 - 1086
[19] Hasani R, 2020, Arxiv, DOI [arXiv:2006.04439, 10.1609/aaai.v35i9.16936, DOI 10.1609/AAAI.V35I9.16936]
[20] Hasssan A., 2023, LT-SNN: Self-adaptive spiking neural network for event-based classification and object detection

← 1 2 3 4 5 6 7 →