SpikingViT: A Multiscale Spiking Vision Transformer Model for Event-Based ObjectDetection

被引：1

作者：

Yu, Lixing ^{[1
]}

Chen, Hanqi ^{[1
]}

Wang, Ziming ^{[2
]}

Zhan, Shaojie ^{[1
]}

Shao, Jiankun ^{[3
]}

Liu, Qingjie ^{[4
]}

Xu, Shu ^{[4
]}

机构：

[1] Yunnan Univ, Sch Informat Sci & Engn, Kunming 650500, Peoples R China

[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China

[3] Beijing Inst Technol, State Key Lab Explos Sci & Technol, Beijing 100081, Peoples R China

[4] China Nanhu Acad Elect & Informat Technol, Jiaxing 314002, Peoples R China

来源：

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS | 2025年 / 17卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Object detection; Transformers; Cameras; Feature extraction; Data mining; Voltage control; Task analysis; DVS data converting; object detection; residual voltage memory; spiking transformer; OBJECT DETECTION; NETWORKS;

D O I：

10.1109/TCDS.2024.3422873

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Event cameras have unique advantages in object detection, capturing asynchronous events without continuous frames. They excel in dynamic range, low latency, and high-speed motion scenarios, with lower power consumption. However, aggregating event data into image frames leads to information loss and reduced detection performance. Applying traditional neural networks to event camera outputs is challenging due to event data's distinct characteristics. In this study, we present a novel spiking neural networks (SNNs)-based object detection model, the spiking vision transformer (SpikingViT) to address these issues. First, we design a dedicated event data converting module that effectively captures the unique characteristics of event data, mitigating the risk of information loss while preserving its spatiotemporal features. Second, we introduce SpikingViT, a novel object detection model that leverages SNNs capable of extracting spatiotemporal information among events data. SpikingViT combines the advantages of SNNs and transformer models, incorporating mechanisms such as attention and residual voltage memory to further enhance detection performance. Extensive experiments have substantiated the remarkable proficiency of SpikingViT in event-based object detection, positioning it as a formidable contender. Our proposed approach adeptly retains spatiotemporal information inherent in event data, leading to a substantial enhancement in detection performance.

引用

页码：130 / 146

页数：17

共 63 条

[1] ACE: An Efficient Asynchronous Corner Tracker for Event Cameras
Alzugaray, Ignacio
Chli, Margarita
[J]. 2018 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2018, : 653 - 661
[2] Asynchronous Corner Detection and Tracking for Event Cameras in Real Time
Alzugaray, Ignacio
Chli, Margarita
[J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (04): : 3177 - 3184
[3] Bodden L, 2024, Arxiv, DOI arXiv:2402.01287
[4] Asynchronous Convolutional Networks for Object Detection in Neuromorphic Cameras
Cannici, Marco
Ciccone, Marco
Romanoni, Andrea
Matteucci, Matteo
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 1656 - 1665
[5] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[6] A Survey of the Four Pillars for Small Object Detection: Multiscale Representation, Contextual Information, Super-Resolution, and Region Proposal
Chen, Guang
Wang, Haitao
Chen, Kai
Li, Zhijun
Song, Zida
Liu, Yinlong
Chen, Wenkai
Knoll, Alois
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (02): : 936 - 953
[7] Pseudo-labels for Supervised Learning on Dynamic Vision Sensor Data, Applied to Object Detection under Ego-motion
Chen, Nicholas F. Y.
[J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 757 - 766
[8] Object Detection with Spiking Neural Networks on Automotive Event Data
Cordone, Loic
Miramond, Benoit
Thierion, Philippe
[J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
[9] de Tournemire P, 2020, Arxiv, DOI [arXiv:2001.08499, 10.48550/arXiv.2001.08499]
[10] SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence
Fang, Wei
Chen, Yanqi
Ding, Jianhao
Yu, Zhaofei
Masquelier, Timothee
Chen, Ding
Huang, Liwei
Zhou, Huihui
Li, Guoqi
Tian, Yonghong
[J]. SCIENCE ADVANCES, 2023, 9 (40):

← 1 2 3 4 5 6 7 →