Robust LiDAR-Camera 3-D Object Detection With Object-Level Feature Fusion

被引:2
作者
Chen, Yongxiang [1 ,2 ,3 ]
Yan, Fuwu [1 ,2 ,3 ]
Yin, Zhishuai [1 ,2 ,3 ]
Nie, Linzhen [1 ,2 ,3 ]
Tao, Bo [1 ,2 ,3 ]
Miao, Mingze [1 ,2 ,3 ]
Zheng, Ningyu [1 ,2 ,3 ]
Zhang, Pei [1 ,2 ,3 ]
Zeng, Junyuan [1 ,2 ,3 ]
机构
[1] Wuhan Univ Technol, Hubei Key Lab Adv Technol Automot Components, Hubei Collaborat Innovat Ctr Automot Components T, Sch Automot Engn, Wuhan 430070, Peoples R China
[2] Wuhan Univ Technol, Hubei Res Ctr New Energy & Intelligent Connected, Wuhan 430070, Peoples R China
[3] Adv Energy Sci & Technol Guangdong Lab, Foshan Xianhu Lab, Foshan 528200, Peoples R China
基金
中国国家自然科学基金;
关键词
3-D object detection; autonomous driving; multimodal fusion; object-level fusion;
D O I
10.1109/JSEN.2024.3436834
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Cross-modal fusion methods enhance the 3-D object detection by leveraging multimodal complementarity. However, current methods, which seek to establish pointwise or region of interest-wise (RoI-wise) cross-modal correspondence, have their own limitations. The pointwise methods struggle with weak alignment and inaccurate depth estimation. The RoI-wise methods, which adopt RoI as the basis of sampling features and fusion features, produce ambiguous object-level features. In light of this, this article introduces object-level feature fusion network (OFFNet), a 3-D detector implementing cross-modal object-level feature fusion (COFF) discriminatively. To combine spatial information from peripheral light detection and ranging (LiDAR) features and unambiguous semantics from foreground image features, our approach operates fusion in two stages. The first stage unifies multimodal features into homogeneous representations in the point cloud space to facilitate cross-modal feature interaction. For this purpose, a point enhancement module (PEM) is designed to generate two sets of keypoints. A set of original keypoints are sampled from foreground LiDAR points to aggregate multidimensional LiDAR features in the COFF procedure. Meanwhile, a set of pseudokeypoints, generated by shifting the foreground keypoints toward the objects' geometric center, are used as sampling points to aggregate foreground image features. In the second stage, we obtain robust object-level features by hierarchically fusing LiDAR and image features specific to each instance. Coarse 3-D candidate boxes are gridded and segmented into external and internal layers. The external layer aggregates LiDAR features from adjacent original keypoints, while the internal layer aggregates image features from neighboring pseudokeypoints. Experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) and nuScenes benchmark demonstrate OFFNet's state-of-the-art (SOTA) performance, particularly excelling in the challenging "cyclist" category.
引用
收藏
页码:29108 / 29120
页数:13
相关论文
共 66 条
[1]   TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers [J].
Bai, Xuyang ;
Hu, Zeyu ;
Zhu, Xinge ;
Huang, Qingqiu ;
Chen, Yilun ;
Fu, Hangbo ;
Tai, Chiew-Lan .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :1080-1089
[2]   CurveFormer: 3D Lane Detection by Curve Propagation with Curve Queries and Attention [J].
Bai, Yifeng ;
Chen, Zhirong ;
Fu, Zhangjie ;
Peng, Lang ;
Liang, Pengpeng ;
Cheng, Erkang .
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, :7062-7068
[3]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[4]  
Chen C, 2022, AAAI CONF ARTIF INTE, P221
[5]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534
[6]   Monocular 3D Object Detection for Autonomous Driving [J].
Chen, Xiaozhi ;
Kundu, Kaustav ;
Zhang, Ziyu ;
Ma, Huimin ;
Fidler, Sanja ;
Urtasun, Raquel .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2147-2156
[7]  
Chen XZ, 2015, ADV NEUR IN, V28
[8]   Focal Sparse Convolutional Networks for 3D Object Detection [J].
Chen, Yukang ;
Li, Yanwei ;
Zhang, Xiangyu ;
Sun, Jian ;
Jia, Jiaya .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5418-5427
[9]  
Cortinhal T, 2023, Arxiv, DOI arXiv:2309.08932
[10]   Large Scale Interactive Motion Forecasting for Autonomous Driving : The WAYMO OPEN MOTION DATASET [J].
Ettinger, Scott ;
Cheng, Shuyang ;
Caine, Benjamin ;
Liu, Chenxi ;
Zhao, Hang ;
Pradhan, Sabeek ;
Chai, Yuning ;
Sapp, Ben ;
Qi, Charles ;
Zhou, Yin ;
Yang, Zoey ;
Chouard, Aurelien ;
Sun, Pei ;
Ngiam, Jiquan ;
Vasudevan, Vijay ;
McCauley, Alexander ;
Shlens, Jonathon ;
Anguelov, Dragomir .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :9690-9699