Fully Sparse Fusion for 3D Object Detection

被引：12

作者：

Li, Yingyan ^{[1
,2
]}

Fan, Lue ^{[1
,2
]}

Liu, Yang ^{[1
]}

Huang, Zehao ^{[3
]}

Chen, Yuntao ^{[4
]}

Wang, Naiyan ^{[3
]}

Zhang, Zhaoxiang ^{[1
,2
,4
]}

机构：

[1] Chinese Acad Sci CASIA, Inst Automat, Ctr Researchon Intelligent Percept & Comp CRIPAC, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci UCAS, Sch Future Technol, Beijing 100049, Peoples R China

[3] TuSimple, Beijing 100020, Peoples R China

[4] Chinese Acad Sci HKISICAS, Hong Kong Inst Sci & Innovat, Ctr Artificial Intelligence & Robot, Hong Kong, Peoples R China

来源：

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE | 2024年 / 46卷 / 11期

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Three-dimensional displays; Feature extraction; Laser radar; Cameras; Detectors; Instance segmentation; Point cloud compression; 3D object detection; multi-sensor fusion; fully sparse architecture; autonomous driving; long-range perception;

D O I：

10.1109/TPAMI.2024.3392303

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Currently prevalent multi-modal 3D detection methods rely on dense detectors that usually use dense Bird's-Eye-View (BEV) feature maps. However, the cost of such BEV feature maps is quadratic to the detection range, making it not scalable for long-range detection. Recently, LiDAR-only fully sparse architecture has been gaining attention for its high efficiency in long-range perception. In this paper, we study how to develop a multi-modal fully sparse detector. Specifically, our proposed detector integrates the well-studied 2D instance segmentation into the LiDAR side, which is parallel to the 3D instance segmentation part in the LiDAR-only baseline. The proposed instance-based fusion framework maintains full sparsity while overcoming the constraints associated with the LiDAR-only fully sparse detector. Our framework showcases state-of-the-art performance on the widely used nuScenes dataset, Waymo Open Dataset, and the long-range Argoverse 2 dataset. Notably, the inference speed of our proposed method under the long-range perception setting is 2.7x faster than that of other state-of-the-art multimodal 3D detection methods.

引用

页码：7217 / 7231

页数：15

共 37 条

[1] TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [J].

Bai, Xuyang ;

Hu, Zeyu ;

Zhu, Xinge ;

Huang, Qingqiu ;

Chen, Yilun ;

Fu, Hangbo ;

Tai, Chiew-Lan .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1080-1089

[2] M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [J].

Brazil, Garrick ;

Liu, Xiaoming .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9286-9295

[3] nuScenes: A multimodal dataset for autonomous driving [J].

Caesar, Holger ;

Bankiti, Varun ;

Lang, Alex H. ;

Vora, Sourabh ;

Liong, Venice Erin ;

Xu, Qiang ;

Krishnan, Anush ;

Pan, Yu ;

Baldan, Giancarlo ;

Beijbom, Oscar .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628

[4] VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking [J].

Chen, Yukang ;

Liu, Jianhui ;

Zhang, Xiangyu ;

Qi, Xiaojuan ;

Jia, Jiaya .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2023, :21674-21683

[5]

Chong Z., 2022, arXiv

[6] Super Sparse 3D Object Detection [J].

Fan, Lue ;

Yang, Yuxue ;

Wang, Feng ;

Wang, Naiyan ;

Zhang, Zhaoxiang .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (10) :12490-12505

[7] LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector [J].

Guo, Xiaoyang ;

Shi, Shaoshuai ;

Wang, Xiaogang ;

Li, Hongsheng .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :3133-3143

[8] 3D Video Object Detection with Learnable Object-Centric Global Optimization [J].

He, Jiawei ;

Chen, Yuntao ;

Wang, Naiyan ;

Zhang, Zhaoxiang .

2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, :5106-5115

[9]

Huang JJ, 2022, Arxiv, DOI [arXiv:2203.17054, DOI 10.48550/ARXIV.2203.17054]

[10] PointPillars: Fast Encoders for Object Detection from Point Clouds [J].

Lang, Alex H. ;

Vora, Sourabh ;

Caesar, Holger ;

Zhou, Lubing ;

Yang, Jiong ;

Beijbom, Oscar .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :12689-12697

← 1 2 3 4 →