Fully Sparse Fusion for 3D Object Detection

被引:12
作者
Li, Yingyan [1 ,2 ]
Fan, Lue [1 ,2 ]
Liu, Yang [1 ]
Huang, Zehao [3 ]
Chen, Yuntao [4 ]
Wang, Naiyan [3 ]
Zhang, Zhaoxiang [1 ,2 ,4 ]
机构
[1] Chinese Acad Sci CASIA, Inst Automat, Ctr Researchon Intelligent Percept & Comp CRIPAC, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci UCAS, Sch Future Technol, Beijing 100049, Peoples R China
[3] TuSimple, Beijing 100020, Peoples R China
[4] Chinese Acad Sci HKISICAS, Hong Kong Inst Sci & Innovat, Ctr Artificial Intelligence & Robot, Hong Kong, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Three-dimensional displays; Feature extraction; Laser radar; Cameras; Detectors; Instance segmentation; Point cloud compression; 3D object detection; multi-sensor fusion; fully sparse architecture; autonomous driving; long-range perception;
D O I
10.1109/TPAMI.2024.3392303
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently prevalent multi-modal 3D detection methods rely on dense detectors that usually use dense Bird's-Eye-View (BEV) feature maps. However, the cost of such BEV feature maps is quadratic to the detection range, making it not scalable for long-range detection. Recently, LiDAR-only fully sparse architecture has been gaining attention for its high efficiency in long-range perception. In this paper, we study how to develop a multi-modal fully sparse detector. Specifically, our proposed detector integrates the well-studied 2D instance segmentation into the LiDAR side, which is parallel to the 3D instance segmentation part in the LiDAR-only baseline. The proposed instance-based fusion framework maintains full sparsity while overcoming the constraints associated with the LiDAR-only fully sparse detector. Our framework showcases state-of-the-art performance on the widely used nuScenes dataset, Waymo Open Dataset, and the long-range Argoverse 2 dataset. Notably, the inference speed of our proposed method under the long-range perception setting is 2.7x faster than that of other state-of-the-art multimodal 3D detection methods.
引用
收藏
页码:7217 / 7231
页数:15
相关论文
共 37 条
[31]   Object as Query: Lifting any 2D Object Detector to 3D Detection [J].
Wang, Zitian ;
Huang, Zehao ;
Fu, Jiahui ;
Wang, Naiyan ;
Liu, Si .
2023 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION, ICCV, 2023, :3768-3777
[32]  
Wilson B., 2023, arXiv
[33]   SECOND: Sparsely Embedded Convolutional Detection [J].
Yan, Yan ;
Mao, Yuxing ;
Li, Bo .
SENSORS, 2018, 18 (10)
[34]   BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision [J].
Yang, Chenyu ;
Chen, Yuntao ;
Tian, Hao ;
Tao, Chenxin ;
Zhu, Xizhou ;
Zhang, Zhaoxiang ;
Huang, Gao ;
Li, Hongyang ;
Qiao, Yu ;
Lu, Lewei ;
Zhou, Jie ;
Dai, Jifeng .
2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :17830-17839
[35]   Center-based 3D Object Detection and Tracking [J].
Yin, Tianwei ;
Zhou, Xingyi ;
Krahenbuhl, Philipp .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :11779-11788
[36]   Objects are Different: Flexible Monocular 3D Object Detection [J].
Zhang, Yunpeng ;
Lu, Jiwen ;
Zhou, Jie .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :3288-3297
[37]   VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection [J].
Zhou, Yin ;
Tuzel, Oncel .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4490-4499