SparseDet: Towards efficient multi-view 3D object detection via sparse scene representation

被引:0
作者
Li, Jingzhong
Yang, Lin [1 ]
Shi, Zhen
Chen, Yuxuan
Jin, Yue
Akiyama, Kanta
Xu, Anze
机构
[1] Shanghai Jiao Tong Univ, Sch Mech Engn, Shanghai, Peoples R China
关键词
3D object detection; Sparse scene representation; Bird's eye view; Multi-view cameras; Autonomous driving;
D O I
10.1016/j.aei.2024.102955
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Efficient and reliable 3D object detection via multi-view cameras is pivotal for improving the safety and facilitating the cost-effective deployment of autonomous driving systems. However, owing to the learning of dense scene representations, existing methods still suffer from high computational costs and excessive noise, affecting the efficiency and accuracy of the inference process. To overcome this challenge, we propose SparseDet, a model that exploits sparse scene representations. Specifically, a sparse sampling module with category-aware and geometry-aware supervision is first introduced to adaptively sample foreground features at both semantic and instance levels. Additionally, to conserve computational resources while retaining context information, we propose a background aggregation module designed to compress extensive background features into a compact set. These strategies can markedly diminish feature volume while preserving essential information to boost computational efficiency without compromising accuracy. Due to the efficient sparse scene representation, our SparseDet achieves leading performance on the widely used nuScenes benchmark. Comprehensive experiments validate that SparseDet surpasses the PETR while reducing the decoder computational complexity by 47% in terms of FLOPs, facilitating a leading inference speed of 35.6 FPS on a single RTX3090 GPU.
引用
收藏
页数:14
相关论文
共 56 条
  • [1] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
  • [2] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [3] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation
    Chen, Hansheng
    Wang, Pichao
    Wang, Fan
    Tian, Wei
    Xiong, Lu
    Li, Hao
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2771 - 2780
  • [4] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
  • [5] SpaRSE-BIM: Classification of IFC-based geometry via sparse convolutional neural networks
    Emunds, Christoph
    Pauen, Nicolas
    Richter, Veronika
    Frisch, Jerome
    van Treeck, Christoph
    [J]. ADVANCED ENGINEERING INFORMATICS, 2022, 53
  • [6] Super Sparse 3D Object Detection
    Fan, Lue
    Yang, Yuxue
    Wang, Feng
    Wang, Naiyan
    Zhang, Zhaoxiang
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12490 - 12505
  • [7] Fan L, 2022, Arxiv, DOI arXiv:2207.10035
  • [8] Embracing Single Stride 3D Object Detector with Sparse Transformer
    Fan, Lue
    Pang, Ziqi
    Zhang, Tianyuan
    Wang, Yu-Xiong
    Zhao, Hang
    Wang, Feng
    Wang, Naiyan
    Zhang, Zhaoxiang
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8448 - 8458
  • [9] Deep Residual Learning for Image Recognition
    He, Kaiming
    Zhang, Xiangyu
    Ren, Shaoqing
    Sun, Jian
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
  • [10] Combining Planning and Deep Reinforcement Learning in Tactical Decision Making for Autonomous Driving
    Hoel, Carl-Johan
    Driggs-Campbell, Katherine
    Wolff, Krister
    Laine, Leo
    Kochenderfer, Mykel J.
    [J]. IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2020, 5 (02): : 294 - 305