VPFNet: Improving 3D Object Detection With Virtual Point Based LiDAR and Stereo Data Fusion

被引：71

作者：

Zhu, Hanqi ^{[1
]}

Deng, Jiajun ^{[2
]}

Zhang, Yu ^{[1
]}

Ji, Jianmin ^{[1
]}

Mao, Qiuyu ^{[1
]}

Li, Houqiang ^{[2
]}

Zhang, Yanyong ^{[1
]}

机构：

[1] Univ Sci & Technol China, Sch Comp Sci & Technol, Hefei 230027, Peoples R China

[2] Univ Sci & Technol China, Dept Elect Engn & Informat Sci, Hefei 230027, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2023年 / 25卷

关键词：

3D object detection; multiple sensors; point clouds; stereo images; R-CNN;

D O I：

10.1109/TMM.2022.3189778

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

It has been well recognized that fusing the complementary information from depth-aware LiDAR point clouds and semantic-rich stereo images would benefit 3D object detection. Nevertheless, it is non-trivial to explore the inherently unnatural interaction between sparse 3D points and dense 2D pixels. To ease this difficulty, the recent approaches generally project the 3D points onto the 2D image plane to sample the image data and then aggregate the data at the points. However, these approaches often suffer from the mismatch between the resolution of point clouds and RGB images, leading to sub-optimal performance. Specifically, taking the sparse points as the multi-modal data aggregation locations causes severe information loss for high-resolution images, which in turn undermines the effectiveness of multi-sensor fusion. In this paper, we present VPFNet -a new architecture that cleverly aligns and aggregates the point cloud and image data at the "virtual" points. Particularly, with their density lying between that of the 3D points and 2D pixels, the virtual points can nicely bridge the resolution gap between the two sensors, and thus preserve more information for processing. Moreover, we also investigate the data augmentation techniques that can be applied to both point clouds and RGB images, as the data augmentation has made non-negligible contribution towards 3D object detectors to date. We have conducted extensive experiments on KITTI dataset, and have observed good performance compared to the state-of-the-art methods. Remarkably, our VPFNet achieves 83.21% moderate $AP_{3D}$ and 91.86% moderate $AP_{BEV}$ on the KITTI test set. The network design also takes computation efficiency into consideration - we can achieve a FPS of 15 on a single NVIDIA RTX 2080Ti GPU.

引用

页码：5291 / 5304

页数：14

共 58 条

[1] Caesar H, 2020, PROC CVPR IEEE, P11618, DOI 10.1109/CVPR42600.2020.01164
[2] HAPGN: Hierarchical Attentive Pooling Graph Network for Point Cloud Segmentation
Chen, Chaofan
Qian, Shengsheng
Fang, Quan
Xu, Changsheng
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2021, 23 : 2335 - 2346
[3] Multi-View 3D Object Detection Network for Autonomous Driving
Chen, Xiaozhi
Ma, Huimin
Wan, Ji
Li, Bo
Xia, Tian
[J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6526 - 6534
[4] Chen YL, 2020, PROC CVPR IEEE, P12533, DOI 10.1109/CVPR42600.2020.01255
[5] Chen YL, 2019, IEEE I CONF COMP VIS, P9774, DOI [10.1109/iccv.2019.00987, 10.1109/ICCV.2019.00987]
[6] Deep Learning for Image and Point Cloud Fusion in Autonomous Driving: A Review
Cui, Yaodong
Chen, Ren
Chu, Wenbo
Chen, Long
Tian, Daxin
Li, Ying
Cao, Dongpu
[J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (02) : 722 - 739
[7] From Multi-View to Hollow-3D: Hallucinated Hollow-3D R-CNN for 3D Object Detection
Deng, Jiajun
Zhou, Wengang
Zhang, Yanyong
Li, Houqiang
[J]. IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2021, 31 (12) : 4722 - 4734
[8] Deng JJ, 2021, AAAI CONF ARTIF INTE, V35, P1201
[9] Du L, 2020, PROC CVPR IEEE, P13326, DOI 10.1109/CVPR42600.2020.01334
[10] Garg D, 2020, ADV NEUR IN, V33

← 1 2 3 4 5 6 →