SparseDet: Towards efficient multi-view 3D object detection via sparse scene representation

被引:0
|
作者
Li, Jingzhong
Yang, Lin [1 ]
Shi, Zhen
Chen, Yuxuan
Jin, Yue
Akiyama, Kanta
Xu, Anze
机构
[1] Shanghai Jiao Tong Univ, Sch Mech Engn, Shanghai, Peoples R China
关键词
3D object detection; Sparse scene representation; Bird's eye view; Multi-view cameras; Autonomous driving;
D O I
10.1016/j.aei.2024.102955
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Efficient and reliable 3D object detection via multi-view cameras is pivotal for improving the safety and facilitating the cost-effective deployment of autonomous driving systems. However, owing to the learning of dense scene representations, existing methods still suffer from high computational costs and excessive noise, affecting the efficiency and accuracy of the inference process. To overcome this challenge, we propose SparseDet, a model that exploits sparse scene representations. Specifically, a sparse sampling module with category-aware and geometry-aware supervision is first introduced to adaptively sample foreground features at both semantic and instance levels. Additionally, to conserve computational resources while retaining context information, we propose a background aggregation module designed to compress extensive background features into a compact set. These strategies can markedly diminish feature volume while preserving essential information to boost computational efficiency without compromising accuracy. Due to the efficient sparse scene representation, our SparseDet achieves leading performance on the widely used nuScenes benchmark. Comprehensive experiments validate that SparseDet surpasses the PETR while reducing the decoder computational complexity by 47% in terms of FLOPs, facilitating a leading inference speed of 35.6 FPS on a single RTX3090 GPU.
引用
收藏
页数:14
相关论文
共 50 条
  • [1] Multi-View Attentive Contextualization for Multi-View 3D Object Detection
    Liu, Xianpeng
    Zheng, Ce
    Qian, Ming
    Xue, Nan
    Chen, Chen
    Zhang, Zhebin
    Li, Chen
    Wu, Tianfu
    2024 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2024, : 16688 - 16698
  • [2] Multi-view representation and synthesis for 3D object movie
    Lie, WN
    Wei, BE
    2002 INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, VOL II, PROCEEDINGS, 2002, : 529 - 532
  • [3] MULTI-VIEW OBJECT AND HUMAN BODY PART DETECTION UTILIZING 3D SCENE INFORMATION
    Sfiris, Georgios
    Nikolaidis, Nikolaos
    Pitas, Ioannis
    2010 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING, 2010, : 29 - 32
  • [4] Visual Object Tracking via Multi-view and Group Sparse Representation
    Mo, Borui
    He, Ke
    Men, Aidong
    2016 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA (ICCC), 2016,
  • [5] Learning Disentangled Representation for Multi-View 3D Object Recognition
    Huang, Jingjia
    Yan, Wei
    Li, Ge
    Li, Thomas
    Liu, Shan
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2022, 32 (02) : 646 - 659
  • [6] Viewpoint Equivariance for Multi-View 3D Object Detection
    Chen, Dian
    Li, Jie
    Guizilini, Vitor
    Ambrus, Rares
    Gaidon, Adrien
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 9213 - 9222
  • [7] Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View
    Wang, Shuo
    Zhao, Xinhai
    Xu, Hai-Ming
    Chen, Zehui
    Yu, Dameng
    Chang, Jiahao
    Yang, Zhen
    Zhao, Feng
    2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, : 13333 - 13342
  • [8] SparseDet: Towards End-to-End 3D Object Detection
    Han, Jianhong
    Wan, Zhaoyi
    Liu, Zhe
    Feng, Jie
    Zhou, Bingfeng
    PROCEEDINGS OF THE 17TH INTERNATIONAL JOINT CONFERENCE ON COMPUTER VISION, IMAGING AND COMPUTER GRAPHICS THEORY AND APPLICATIONS (VISAPP), VOL 4, 2022, : 781 - 792
  • [9] Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection
    Wang, Shihao
    Liu, Yingfei
    Wang, Tiancai
    Li, Ying
    Zhang, Xiangyu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3598 - 3608
  • [10] Towards stable and salient multi-view representation of 3D shapes
    Yamauchi, Hitoshi
    Saleem, Waqar
    Yoshizawa, Shin
    Karni, Zachi
    Belyaev, Alexander
    Seidel, Hans-Peter
    IEEE INTERNATIONAL CONFERENCE ON SHAPE MODELING AND APPLICATIONS 2006, PROCEEDINGS, 2006, : 265 - +