PV-RCNN plus plus : Point-Voxel Feature Set Abstraction With Local Vector Representation for 3D Object Detection

被引：194

作者：

Shi, Shaoshuai ^{[1
,2
]}

Jiang, Li ^{[1
,2
]}

Deng, Jiajun ^{[3
]}

Wang, Zhe ^{[4
]}

Guo, Chaoxu ^{[4
]}

Shi, Jianping ^{[4
]}

Wang, Xiaogang ^{[1
]}

Li, Hongsheng ^{[1
]}

机构：

[1] Chinese Univ Hong Kong, Hong Kong, Peoples R China

[2] Max Planck Inst Informat, Saarbrucken, Germany

[3] Univ Sydney, Sydney, NSW, Australia

[4] SenseTime Res, Shanghai, Peoples R China

来源：

INTERNATIONAL JOURNAL OF COMPUTER VISION | 2023年 / 131卷 / 02期

关键词：

3D object Detection; Point clouds; LiDAR; Autonomous driving; Sparse convolution; NETWORKS;

D O I：

10.1007/s11263-022-01710-9

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

3D object detection is receiving increasing attention from both industry and academia thanks to its wide applications in various fields. In this paper, we propose Point-Voxel Region-based Convolution Neural Networks (PV-RCNNs) for 3D object detection on point clouds. First, we propose a novel 3D detector, PV-RCNN, which boosts the 3D detection performance by deeply integrating the feature learning of both point-based set abstraction and voxel-based sparse convolution through two novel steps, i.e., the voxel-to-keypoint scene encoding and the keypoint-to-grid RoI feature abstraction. Second, we propose an advanced framework, PV-RCNN++, for more efficient and accurate 3D object detection. It consists of two major improvements: sectorized proposal-centric sampling for efficiently producing more representative keypoints, and VectorPool aggregation for better aggregating local point features with much less resource consumption. With these two strategies, our PV-RCNN++ is about 3x faster than PV-RCNN, while also achieving better performance. The experiments demonstrate that our proposed PV-RCNN++ framework achieves state-of-the-art 3D detection performance on the large-scale and highly-competitive Waymo Open Dataset with 10 FPS inference speed on the detection range of 150m x 150m.

引用

页码：531 / 551

页数：21

共 82 条

[81]

Zhou Y., 2020, CORL

[82] VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection [J].

Zhou, Yin ;

Tuzel, Oncel .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :4490-4499

← 1 2 3 4 5 6 7 8 9 →