Scalable 3D Object Detection Pipeline With Center-Based Sequential Feature Aggregation for Intelligent Vehicles

被引：4

作者：

Jiang, Qi ^{[1
]}

Hu, Chuan ^{[2
]}

Zhao, Baixuan ^{[1
]}

Huang, Yonghui ^{[1
]}

Zhang, Xi ^{[2
]}

机构：

[1] Shanghai Jiao Tong Univ, Sch Mech Engn, Shanghai 200240, Peoples R China

[2] Shanghai Jiao Tong Univ, Intelligent Vehicle Inst, Sch Mech Engn, Shanghai 200240, Peoples R China

来源：

IEEE TRANSACTIONS ON INTELLIGENT VEHICLES | 2024年 / 9卷 / 01期

基金：

中国国家自然科学基金;

关键词：

Three-dimensional displays; Laser radar; Feature extraction; Proposals; Detectors; Semantics; Object detection; 3D object detection; multi-sensor fusion; intelligent vehicle perception; feature aggregation;

D O I：

10.1109/TIV.2023.3299619

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

3D object detection plays a key role in the perception system of intelligent vehicles. The reliable 3D structural information provided by LiDAR points enables the accurate regression of position and pose, while the semantic ambiguity issue caused by the sparse points is still challenging. In this article, a scalable 3D object detection pipeline CenterSFA and a series of new modules are proposed to improve the detection performance. In contrast to previous point-level fusing models, semantic and geometric cues from images are sequentially utilized in a center-based paradigm. The object centers are accurately predicted with semantic guidance and selectively employed as the basis for feature aggregation and property regression. Specifically, the attention mechanism is utilized in the semantic and spatial similarity calculation, enabling the surrounding feature aggregation for multi-scale objects. An instance-level correlation is established between the camera feature and the BEV feature for cross-modal feature aggregation. Extensive experiments are conducted on the large-scale nuScenes dataset to verify the state-of-the-art performance of the proposed model, especially for occluded objects and far-range detection. The proposed model outperforms the competitive CenterPoint by 10.4% in mAP and 5.4% in NDS, as well as the representative fusion method MVP by 2.8% in mAP and 1.6% in NDS on val set, indicating its superiority in accurate 3D detection.

引用

页码：1512 / 1523

页数：12

共 53 条

[1] TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers
Bai, Xuyang
Hu, Zeyu
Zhu, Xinge
Huang, Qingqiu
Chen, Yilun
Fu, Hangbo
Tai, Chiew-Lan
[J]. 2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 1080 - 1089
[2] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
[3] Carion Nicolas, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12346), P213, DOI 10.1007/978-3-030-58452-8_13
[4] Multi-View 3D Object Detection Network for Autonomous Driving
Chen, Xiaozhi
Ma, Huimin
Wan, Ji
Li, Bo
Xia, Tian
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6526 - 6534
[5] FUTR3D: A Unified Sensor Fusion Framework for 3D Detection
Chen, Xuanyao
Zhang, Tianyuan
Wang, Yue
Wang, Yilun
Zhao, Hang
[J]. 2023 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, CVPRW, 2023, : 172 - 181
[6] Deformable Feature Aggregation for Dynamic Multi-modal 3D Object Detection
Chen, Zehui
Li, Zhenyu
Zhang, Shiquan
Fang, Liangji
Jiang, Qinhong
Zhao, Feng
[J]. COMPUTER VISION, ECCV 2022, PT VIII, 2022, 13668 : 628 - 644
[7] Contributors M, 2020, MMDetection3D: OpenMMLab next-generation platform for general 3D object detection
[8] CenterNet: Keypoint Triplets for Object Detection
Duan, Kaiwen
Bai, Song
Xie, Lingxi
Qi, Honggang
Huang, Qingming
Tian, Qi
[J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6568 - 6577
[9] Fast R-CNN
Girshick, Ross
[J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 1440 - 1448
[10] Rich feature hierarchies for accurate object detection and semantic segmentation
Girshick, Ross
Donahue, Jeff
Darrell, Trevor
Malik, Jitendra
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 580 - 587

← 1 2 3 4 5 6 →