PV-RCNN plus plus : Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection

被引:184
作者
Shi, Shaoshuai [1 ,2 ]
Jiang, Li [1 ,2 ]
Deng, Jiajun [3 ]
Wang, Zhe [4 ]
Guo, Chaoxu [4 ]
Shi, Jianping [4 ]
Wang, Xiaogang [1 ]
Li, Hongsheng [1 ]
机构
[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China
[2] Max Planck Inst Informat, Saarbrucken, Germany
[3] Univ Sydney, Sydney, NSW, Australia
[4] SenseTime Res, Shanghai, Peoples R China
关键词
3D object Detection; Point clouds; LiDAR; Autonomous driving; Sparse convolution; NETWORKS;
D O I
10.1007/s11263-022-01710-9
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D object detection is receiving increasing attention from both industry and academia thanks to its wide applications in various fields. In this paper, we propose Point-Voxel Region-based Convolution Neural Networks (PV-RCNNs) for 3D object detection on point clouds. First, we propose a novel 3D detector, PV-RCNN, which boosts the 3D detection performance by deeply integrating the feature learning of both point-based set abstraction and voxel-based sparse convolution through two novel steps, i.e., the voxel-to-keypoint scene encoding and the keypoint-to-grid RoI feature abstraction. Second, we propose an advanced framework, PV-RCNN++, for more efficient and accurate 3D object detection. It consists of two major improvements: sectorized proposal-centric sampling for efficiently producing more representative keypoints, and VectorPool aggregation for better aggregating local point features with much less resource consumption. With these two strategies, our PV-RCNN++ is about 3x faster than PV-RCNN, while also achieving better performance. The experiments demonstrate that our proposed PV-RCNN++ framework achieves state-of-the-art 3D detection performance on the large-scale and highly-competitive Waymo Open Dataset with 10 FPS inference speed on the detection range of 150m x 150m.
引用
收藏
页码:531 / 551
页数:21
相关论文
共 82 条
[1]   M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [J].
Brazil, Garrick ;
Liu, Xiaoming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9286-9295
[2]   Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicle analysis from monocular image [J].
Chabot, Florian ;
Chaouch, Mohamed ;
Rabarisoa, Jaonary ;
Teuliere, Celine ;
Chateau, Thierry .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :1827-1836
[3]  
Chen Q., 2019, Object as hotspots: an anchor-free 3D object detection approach via firing of hotspots
[4]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534
[5]   Monocular 3D Object Detection for Autonomous Driving [J].
Chen, Xiaozhi ;
Kundu, Kaustav ;
Zhang, Ziyu ;
Ma, Huimin ;
Fidler, Sanja ;
Urtasun, Raquel .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2147-2156
[6]   DSGN: Deep Stereo Geometry Network for 3D Object Detection [J].
Chen, Yilun ;
Liu, Shu ;
Shen, Xiaoyong ;
Jia, Jiaya .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :12533-12542
[7]  
Chen YL, 2019, IEEE I CONF COMP VIS, P9774, DOI [10.1109/ICCV.2019.00987, 10.1109/iccv.2019.00987]
[8]   4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks [J].
Choy, Christopher ;
Gwak, JunYoung ;
Savarese, Silvio .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3070-3079
[9]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[10]  
Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074