SparseDet: Towards efficient multi-view 3D object detection via sparse scene representation

被引:0
|
作者
Li, Jingzhong
Yang, Lin [1 ]
Shi, Zhen
Chen, Yuxuan
Jin, Yue
Akiyama, Kanta
Xu, Anze
机构
[1] Shanghai Jiao Tong Univ, Sch Mech Engn, Shanghai, Peoples R China
关键词
3D object detection; Sparse scene representation; Bird's eye view; Multi-view cameras; Autonomous driving;
D O I
10.1016/j.aei.2024.102955
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Efficient and reliable 3D object detection via multi-view cameras is pivotal for improving the safety and facilitating the cost-effective deployment of autonomous driving systems. However, owing to the learning of dense scene representations, existing methods still suffer from high computational costs and excessive noise, affecting the efficiency and accuracy of the inference process. To overcome this challenge, we propose SparseDet, a model that exploits sparse scene representations. Specifically, a sparse sampling module with category-aware and geometry-aware supervision is first introduced to adaptively sample foreground features at both semantic and instance levels. Additionally, to conserve computational resources while retaining context information, we propose a background aggregation module designed to compress extensive background features into a compact set. These strategies can markedly diminish feature volume while preserving essential information to boost computational efficiency without compromising accuracy. Due to the efficient sparse scene representation, our SparseDet achieves leading performance on the widely used nuScenes benchmark. Comprehensive experiments validate that SparseDet surpasses the PETR while reducing the decoder computational complexity by 47% in terms of FLOPs, facilitating a leading inference speed of 35.6 FPS on a single RTX3090 GPU.
引用
收藏
页数:14
相关论文
共 50 条
  • [41] Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
    Zong, Zhuofan
    Jiang, Dongzhi
    Song, Guanglu
    Xue, Zeyue
    Su, Jingyong
    Li, Hongsheng
    Liu, Yu
    Proceedings of the IEEE International Conference on Computer Vision, 2023, : 3758 - 3767
  • [42] Learning Relationships for Multi-View 3D Object Recognition
    Yang, Ze
    Wang, Liwei
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7504 - 7513
  • [43] Semi-supervised Monocular 3D Object Detection by Multi-view Consistency
    Lian, Qing
    Xu, Yanbo
    Yao, Weilong
    Chen, Yingcong
    Zhang, Tong
    COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 715 - 731
  • [44] Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection
    Xie, Yiming
    Jiang, Huaizu
    Gkioxari, Georgia
    Straub, Julian
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18324 - 18334
  • [45] OccluBEV: Occlusion Aware Spatiotemporal Modeling for Multi-view 3D Object Detection
    Wen, Ziteng
    Xu, Hai
    Liu, Chenyu
    Guo, Tao
    Hu, Jinshui
    He, Xuming
    Wang, Fengren
    Lou, Shun
    Fan, Haibo
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4074 - 4083
  • [46] SOGDet: Semantic-Occupancy Guided Multi-View 3D Object Detection
    Zhou, Qiu
    Cao, Jinming
    Leng, Hanchao
    Yin, Yifang
    Kun, Yu
    Zimmermann, Roger
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7668 - 7676
  • [47] 3D Point Cloud Object Detection with Multi-View Convolutional Neural Network
    Pang, Guan
    Neumann, Ulrich
    2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 585 - 590
  • [48] Object Detection in Multi-view 3D Reconstruction Using Semantic and Geometric Context
    Weinshall, D.
    Golbert, A.
    CMRT13 - CITY MODELS, ROADS AND TRAFFIC 2013, 2013, II-3/W3 : 97 - 102
  • [49] Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
    Zong, Zhuofan
    Jiang, Dongzhi
    Song, Guanglu
    Xue, Zeyue
    Su, Jingyong
    Li, Hongsheng
    Liu, Yu
    2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3758 - 3767
  • [50] SMIFormer: Learning Spatial Feature Representation for 3D Object Detection from 4D Imaging Radar via Multi-View Interactive Transformers
    Shi, Weigang
    Zhu, Ziming
    Zhang, Kezhi
    Chen, Huanlei
    Yu, Zhuoping
    Zhu, Yu
    SENSORS, 2023, 23 (23)