SparseDet: Towards efficient multi-view 3D object detection via sparse scene representation

被引：0

作者：

Li, Jingzhong

Yang, Lin ^{[1
]}

Shi, Zhen

Chen, Yuxuan

Jin, Yue

Akiyama, Kanta

Xu, Anze

机构：

[1] Shanghai Jiao Tong Univ, Sch Mech Engn, Shanghai, Peoples R China

来源：

ADVANCED ENGINEERING INFORMATICS | 2024年 / 62卷

关键词：

3D object detection; Sparse scene representation; Bird's eye view; Multi-view cameras; Autonomous driving;

D O I：

10.1016/j.aei.2024.102955

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Efficient and reliable 3D object detection via multi-view cameras is pivotal for improving the safety and facilitating the cost-effective deployment of autonomous driving systems. However, owing to the learning of dense scene representations, existing methods still suffer from high computational costs and excessive noise, affecting the efficiency and accuracy of the inference process. To overcome this challenge, we propose SparseDet, a model that exploits sparse scene representations. Specifically, a sparse sampling module with category-aware and geometry-aware supervision is first introduced to adaptively sample foreground features at both semantic and instance levels. Additionally, to conserve computational resources while retaining context information, we propose a background aggregation module designed to compress extensive background features into a compact set. These strategies can markedly diminish feature volume while preserving essential information to boost computational efficiency without compromising accuracy. Due to the efficient sparse scene representation, our SparseDet achieves leading performance on the widely used nuScenes benchmark. Comprehensive experiments validate that SparseDet surpasses the PETR while reducing the decoder computational complexity by 47% in terms of FLOPs, facilitating a leading inference speed of 35.6 FPS on a single RTX3090 GPU.

引用

页数：14

共 50 条

[41] Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Zong, Zhuofan
Jiang, Dongzhi
Song, Guanglu
Xue, Zeyue
Su, Jingyong
Li, Hongsheng
Liu, Yu
Proceedings of the IEEE International Conference on Computer Vision, 2023, : 3758 - 3767
[42] Learning Relationships for Multi-View 3D Object Recognition
Yang, Ze
Wang, Liwei
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7504 - 7513
[43] Semi-supervised Monocular 3D Object Detection by Multi-view Consistency
Lian, Qing
Xu, Yanbo
Yao, Weilong
Chen, Yingcong
Zhang, Tong
COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 715 - 731
[44] Pixel-Aligned Recurrent Queries for Multi-View 3D Object Detection
Xie, Yiming
Jiang, Huaizu
Gkioxari, Georgia
Straub, Julian
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2023), 2023, : 18324 - 18334
[45] OccluBEV: Occlusion Aware Spatiotemporal Modeling for Multi-view 3D Object Detection
Wen, Ziteng
Xu, Hai
Liu, Chenyu
Guo, Tao
Hu, Jinshui
He, Xuming
Wang, Fengren
Lou, Shun
Fan, Haibo
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 4074 - 4083
[46] SOGDet: Semantic-Occupancy Guided Multi-View 3D Object Detection
Zhou, Qiu
Cao, Jinming
Leng, Hanchao
Yin, Yifang
Kun, Yu
Zimmermann, Roger
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 7, 2024, : 7668 - 7676
[47] 3D Point Cloud Object Detection with Multi-View Convolutional Neural Network
Pang, Guan
Neumann, Ulrich
2016 23RD INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2016, : 585 - 590
[48] Object Detection in Multi-view 3D Reconstruction Using Semantic and Geometric Context
Weinshall, D.
Golbert, A.
CMRT13 - CITY MODELS, ROADS AND TRAFFIC 2013, 2013, II-3/W3 : 97 - 102
[49] Temporal Enhanced Training of Multi-view 3D Object Detector via Historical Object Prediction
Zong, Zhuofan
Jiang, Dongzhi
Song, Guanglu
Xue, Zeyue
Su, Jingyong
Li, Hongsheng
Liu, Yu
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, : 3758 - 3767
[50] SMIFormer: Learning Spatial Feature Representation for 3D Object Detection from 4D Imaging Radar via Multi-View Interactive Transformers
Shi, Weigang
Zhu, Ziming
Zhang, Kezhi
Chen, Huanlei
Yu, Zhuoping
Zhu, Yu
SENSORS, 2023, 23 (23)

← 1 2 3 4 5 →