SpikingViT: A Multiscale Spiking Vision Transformer Model for Event-Based ObjectDetection

被引:1
作者
Yu, Lixing [1 ]
Chen, Hanqi [1 ]
Wang, Ziming [2 ]
Zhan, Shaojie [1 ]
Shao, Jiankun [3 ]
Liu, Qingjie [4 ]
Xu, Shu [4 ]
机构
[1] Yunnan Univ, Sch Informat Sci & Engn, Kunming 650500, Peoples R China
[2] Zhejiang Univ, Coll Comp Sci & Technol, Hangzhou 310027, Peoples R China
[3] Beijing Inst Technol, State Key Lab Explos Sci & Technol, Beijing 100081, Peoples R China
[4] China Nanhu Acad Elect & Informat Technol, Jiaxing 314002, Peoples R China
基金
中国国家自然科学基金;
关键词
Object detection; Transformers; Cameras; Feature extraction; Data mining; Voltage control; Task analysis; DVS data converting; object detection; residual voltage memory; spiking transformer; OBJECT DETECTION; NETWORKS;
D O I
10.1109/TCDS.2024.3422873
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Event cameras have unique advantages in object detection, capturing asynchronous events without continuous frames. They excel in dynamic range, low latency, and high-speed motion scenarios, with lower power consumption. However, aggregating event data into image frames leads to information loss and reduced detection performance. Applying traditional neural networks to event camera outputs is challenging due to event data's distinct characteristics. In this study, we present a novel spiking neural networks (SNNs)-based object detection model, the spiking vision transformer (SpikingViT) to address these issues. First, we design a dedicated event data converting module that effectively captures the unique characteristics of event data, mitigating the risk of information loss while preserving its spatiotemporal features. Second, we introduce SpikingViT, a novel object detection model that leverages SNNs capable of extracting spatiotemporal information among events data. SpikingViT combines the advantages of SNNs and transformer models, incorporating mechanisms such as attention and residual voltage memory to further enhance detection performance. Extensive experiments have substantiated the remarkable proficiency of SpikingViT in event-based object detection, positioning it as a formidable contender. Our proposed approach adeptly retains spatiotemporal information inherent in event data, leading to a substantial enhancement in detection performance.
引用
收藏
页码:130 / 146
页数:17
相关论文
共 63 条
  • [1] ACE: An Efficient Asynchronous Corner Tracker for Event Cameras
    Alzugaray, Ignacio
    Chli, Margarita
    [J]. 2018 INTERNATIONAL CONFERENCE ON 3D VISION (3DV), 2018, : 653 - 661
  • [2] Asynchronous Corner Detection and Tracking for Event Cameras in Real Time
    Alzugaray, Ignacio
    Chli, Margarita
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2018, 3 (04): : 3177 - 3184
  • [3] Bodden L, 2024, Arxiv, DOI arXiv:2402.01287
  • [4] Asynchronous Convolutional Networks for Object Detection in Neuromorphic Cameras
    Cannici, Marco
    Ciccone, Marco
    Romanoni, Andrea
    Matteucci, Matteo
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, : 1656 - 1665
  • [5] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [6] A Survey of the Four Pillars for Small Object Detection: Multiscale Representation, Contextual Information, Super-Resolution, and Region Proposal
    Chen, Guang
    Wang, Haitao
    Chen, Kai
    Li, Zhijun
    Song, Zida
    Liu, Yinlong
    Chen, Wenkai
    Knoll, Alois
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2022, 52 (02): : 936 - 953
  • [7] Pseudo-labels for Supervised Learning on Dynamic Vision Sensor Data, Applied to Object Detection under Ego-motion
    Chen, Nicholas F. Y.
    [J]. PROCEEDINGS 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW), 2018, : 757 - 766
  • [8] Object Detection with Spiking Neural Networks on Automotive Event Data
    Cordone, Loic
    Miramond, Benoit
    Thierion, Philippe
    [J]. 2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,
  • [9] de Tournemire P, 2020, Arxiv, DOI [arXiv:2001.08499, 10.48550/arXiv.2001.08499]
  • [10] SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence
    Fang, Wei
    Chen, Yanqi
    Ding, Jianhao
    Yu, Zhaofei
    Masquelier, Timothee
    Chen, Ding
    Huang, Liwei
    Zhou, Huihui
    Li, Guoqi
    Tian, Yonghong
    [J]. SCIENCE ADVANCES, 2023, 9 (40):