Scalable 3D Object Detection Pipeline With Center-Based Sequential Feature Aggregation for Intelligent Vehicles

被引:4
作者
Jiang, Qi [1 ]
Hu, Chuan [2 ]
Zhao, Baixuan [1 ]
Huang, Yonghui [1 ]
Zhang, Xi [2 ]
机构
[1] Shanghai Jiao Tong Univ, Sch Mech Engn, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Intelligent Vehicle Inst, Sch Mech Engn, Shanghai 200240, Peoples R China
来源
IEEE TRANSACTIONS ON INTELLIGENT VEHICLES | 2024年 / 9卷 / 01期
基金
中国国家自然科学基金;
关键词
Three-dimensional displays; Laser radar; Feature extraction; Proposals; Detectors; Semantics; Object detection; 3D object detection; multi-sensor fusion; intelligent vehicle perception; feature aggregation;
D O I
10.1109/TIV.2023.3299619
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D object detection plays a key role in the perception system of intelligent vehicles. The reliable 3D structural information provided by LiDAR points enables the accurate regression of position and pose, while the semantic ambiguity issue caused by the sparse points is still challenging. In this article, a scalable 3D object detection pipeline CenterSFA and a series of new modules are proposed to improve the detection performance. In contrast to previous point-level fusing models, semantic and geometric cues from images are sequentially utilized in a center-based paradigm. The object centers are accurately predicted with semantic guidance and selectively employed as the basis for feature aggregation and property regression. Specifically, the attention mechanism is utilized in the semantic and spatial similarity calculation, enabling the surrounding feature aggregation for multi-scale objects. An instance-level correlation is established between the camera feature and the BEV feature for cross-modal feature aggregation. Extensive experiments are conducted on the large-scale nuScenes dataset to verify the state-of-the-art performance of the proposed model, especially for occluded objects and far-range detection. The proposed model outperforms the competitive CenterPoint by 10.4% in mAP and 5.4% in NDS, as well as the representative fusion method MVP by 2.8% in mAP and 1.6% in NDS on val set, indicating its superiority in accurate 3D detection.
引用
收藏
页码:1512 / 1523
页数:12
相关论文
共 53 条
  • [1] TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers
    Bai, Xuyang
    Hu, Zeyu
    Zhu, Xinge
    Huang, Qingqiu
    Chen, Yilun
    Fu, Hangbo
    Tai, Chiew-Lan
    [J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1080 - 1089
  • [2] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
  • [3] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
  • [4] Multi-View 3D Object Detection Network for Autonomous Driving
    Chen, Xiaozhi
    Ma, Huimin
    Wan, Ji
    Li, Bo
    Xia, Tian
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6526 - 6534
  • [5] FUTR3D: A Unified Sensor Fusion Framework for 3D Detection
    Chen, Xuanyao
    Zhang, Tianyuan
    Wang, Yue
    Wang, Yilun
    Zhao, Hang
    [J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, : 172 - 181
  • [6] Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection
    Chen, Zehui
    Li, Zhenyu
    Zhang, Shiquan
    Fang, Liangji
    Jiang, Qinhong
    Zhao, Feng
    [J]. COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 628 - 644
  • [7] Contributors M, 2020, MMDetection3D: OpenMMLab next-generation platform for general 3D object detection
  • [8] CenterNet: Keypoint Triplets for Object Detection
    Duan, Kaiwen
    Bai, Song
    Xie, Lingxi
    Qi, Honggang
    Huang, Qingming
    Tian, Qi
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6568 - 6577
  • [9] Fast R-CNN
    Girshick, Ross
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448
  • [10] Rich feature hierarchies for accurate object detection and semantic segmentation
    Girshick, Ross
    Donahue, Jeff
    Darrell, Trevor
    Malik, Jitendra
    [J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 580 - 587