Fully Sparse Fusion for 3D Object Detection

被引:7
作者
Li, Yingyan [1 ,2 ]
Fan, Lue [1 ,2 ]
Liu, Yang [1 ]
Huang, Zehao [3 ]
Chen, Yuntao [4 ]
Wang, Naiyan [3 ]
Zhang, Zhaoxiang [1 ,2 ,4 ]
机构
[1] Chinese Acad Sci CASIA, Inst Automat, Ctr Researchon Intelligent Percept & Comp CRIPAC, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China
[2] Univ Chinese Acad Sci UCAS, Sch Future Technol, Beijing 100049, Peoples R China
[3] TuSimple, Beijing 100020, Peoples R China
[4] Chinese Acad Sci HKISICAS, Hong Kong Inst Sci & Innovat, Ctr Artificial Intelligence & Robot, Hong Kong, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Three-dimensional displays; Feature extraction; Laser radar; Cameras; Detectors; Instance segmentation; Point cloud compression; 3D object detection; multi-sensor fusion; fully sparse architecture; autonomous driving; long-range perception;
D O I
10.1109/TPAMI.2024.3392303
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Currently prevalent multi-modal 3D detection methods rely on dense detectors that usually use dense Bird's-Eye-View (BEV) feature maps. However, the cost of such BEV feature maps is quadratic to the detection range, making it not scalable for long-range detection. Recently, LiDAR-only fully sparse architecture has been gaining attention for its high efficiency in long-range perception. In this paper, we study how to develop a multi-modal fully sparse detector. Specifically, our proposed detector integrates the well-studied 2D instance segmentation into the LiDAR side, which is parallel to the 3D instance segmentation part in the LiDAR-only baseline. The proposed instance-based fusion framework maintains full sparsity while overcoming the constraints associated with the LiDAR-only fully sparse detector. Our framework showcases state-of-the-art performance on the widely used nuScenes dataset, Waymo Open Dataset, and the long-range Argoverse 2 dataset. Notably, the inference speed of our proposed method under the long-range perception setting is 2.7x faster than that of other state-of-the-art multimodal 3D detection methods.
引用
收藏
页码:7217 / 7231
页数:15
相关论文
共 37 条
  • [11] PointPillars: Fast Encoders for Object Detection from Point Clouds
    Lang, Alex H.
    Vora, Sourabh
    Caesar, Holger
    Zhou, Lubing
    Yang, Jiong
    Beijbom, Oscar
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 12689 - 12697
  • [12] Li B, 2017, IEEE INT C INT ROBOT, P1513, DOI 10.1109/IROS.2017.8205955
  • [13] Stereo R-CNN based 3D Object Detection for Autonomous Driving
    Li, Peiliang
    Chen, Xiaozhi
    Shen, Shaojie
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 7636 - 7644
  • [14] Densely Constrained Depth Estimator for Monocular 3D Object Detection
    Li, Yingyan
    Chen, Yuntao
    He, Jiawei
    Zhang, Zhaoxiang
    [J]. COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 718 - 734
  • [15] Li YH, 2023, AAAI CONF ARTIF INTE, P1477
  • [16] BEVFormer: Learning Bird's-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers
    Li, Zhiqi
    Wang, Wenhai
    Li, Hongyang
    Xie, Enze
    Sima, Chonghao
    Lu, Tong
    Qiao, Yu
    Dai, Jifeng
    [J]. COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 1 - 18
  • [17] Liang TT, 2022, ADV NEUR IN
  • [18] PETR: Position Embedding Transformation for Multi-view 3D Object Detection
    Liu, Yingfei
    Wang, Tiancai
    Zhang, Xiangyu
    Sun, Jian
    [J]. COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 : 531 - 548
  • [19] BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation
    Liu, Zhijian
    Tang, Haotian
    Amini, Alexander
    Yang, Xinyu
    Mao, Huizi
    Rus, Daniela L.
    Han, Song
    [J]. 2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, : 2774 - 2781
  • [20] FlatFormer: Flattened Window Attention for Efficient Point Cloud Transformer
    Liu, Zhijian
    Yang, Xinyu
    Tang, Haotian
    Yang, Shang
    Han, Song
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR, 2023, : 1200 - 1211