SparseDet: Towards efficient multi-view 3D object detection via sparse scene representation

被引：0

作者：

Li, Jingzhong

Yang, Lin ^{[1
]}

Shi, Zhen

Chen, Yuxuan

Jin, Yue

Akiyama, Kanta

Xu, Anze

机构：

[1] Shanghai Jiao Tong Univ, Sch Mech Engn, Shanghai, Peoples R China

来源：

ADVANCED ENGINEERING INFORMATICS | 2024年 / 62卷

关键词：

3D object detection; Sparse scene representation; Bird's eye view; Multi-view cameras; Autonomous driving;

D O I：

10.1016/j.aei.2024.102955

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Efficient and reliable 3D object detection via multi-view cameras is pivotal for improving the safety and facilitating the cost-effective deployment of autonomous driving systems. However, owing to the learning of dense scene representations, existing methods still suffer from high computational costs and excessive noise, affecting the efficiency and accuracy of the inference process. To overcome this challenge, we propose SparseDet, a model that exploits sparse scene representations. Specifically, a sparse sampling module with category-aware and geometry-aware supervision is first introduced to adaptively sample foreground features at both semantic and instance levels. Additionally, to conserve computational resources while retaining context information, we propose a background aggregation module designed to compress extensive background features into a compact set. These strategies can markedly diminish feature volume while preserving essential information to boost computational efficiency without compromising accuracy. Due to the efficient sparse scene representation, our SparseDet achieves leading performance on the widely used nuScenes benchmark. Comprehensive experiments validate that SparseDet surpasses the PETR while reducing the decoder computational complexity by 47% in terms of FLOPs, facilitating a leading inference speed of 35.6 FPS on a single RTX3090 GPU.

引用

页数：14

共 56 条

[1] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
[2] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[3] EPro-PnP: Generalized End-to-End Probabilistic Perspective-n-Points for Monocular Object Pose Estimation
Chen, Hansheng
Wang, Pichao
Wang, Fan
Tian, Wei
Xiong, Lu
Li, Hao
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 2771 - 2780
[4] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[5] SpaRSE-BIM: Classification of IFC-based geometry via sparse convolutional neural networks
Emunds, Christoph
Pauen, Nicolas
Richter, Veronika
Frisch, Jerome
van Treeck, Christoph
[J]. ADVANCED ENGINEERING INFORMATICS, 2022, 53
[6] Super Sparse 3D Object Detection
Fan, Lue
Yang, Yuxue
Wang, Feng
Wang, Naiyan
Zhang, Zhaoxiang
[J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) : 12490 - 12505
[7] Fan L, 2022, Arxiv, DOI arXiv:2207.10035
[8] Embracing Single Stride 3D Object Detector with Sparse Transformer
Fan, Lue
Pang, Ziqi
Zhang, Tianyuan
Wang, Yu-Xiong
Zhao, Hang
Wang, Feng
Wang, Naiyan
Zhang, Zhaoxiang
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 8448 - 8458
[9] Deep Residual Learning for Image Recognition
He, Kaiming
Zhang, Xiangyu
Ren, Shaoqing
Sun, Jian
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 770 - 778
[10] Combining Planning and Deep Reinforcement Learning in Tactical Decision Making for Autonomous Driving
Hoel, Carl-Johan
Driggs-Campbell, Katherine
Wolff, Krister
Laine, Leo
Kochenderfer, Mykel J.
[J]. IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, 2020, 5 (02): : 294 - 305

← 1 2 3 4 5 6 →