HVPR: Hybrid Voxel-Point Representation for Single-stage 3D Object Detection

被引：113

作者：

Noh, Jongyoun ^{[1
]}

Lee, Sanghoon ^{[1
]}

Ham, Bumsub ^{[1
]}

机构：

[1] Yonsei Univ, Sch Elect & Elect Engn, Seoul, South Korea

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

基金：

新加坡国家研究基金会;

关键词：

D O I：

10.1109/CVPR46437.2021.01437

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We address the problem of 3D object detection, that is, estimating 3D object bounding boxes from point clouds. 3D object detection methods exploit either voxel-based or point-based features to represent 3D objects in a scene. Voxel-based features are efficient to extract, while they fail to preserve fine-grained 3D structures of objects. Point-based features, on the other hand, represent the 3D structures more accurately, but extracting these features is computationally expensive. We introduce in this paper a novel single-stage 3D detection method having the merit of both voxel-based and point-based features. To this end, we propose a new convolutional neural network (CNN) architecture, dubbed HVPR, that integrates both features into a single 3D representation effectively and efficiently. Specifically, we augment the point-based features with a memory module to reduce the computational cost. We then aggregate the features in the memory, semantically similar to each voxel-based one, to obtain a hybrid 3D representation in a form of a pseudo image, allowing to localize 3D objects in a single stage efficiently. We also propose an Attentive Multi-scale Feature Module (AMFM) that extracts scale-aware features considering the sparse and irregular patterns of point clouds. Experimental results on the KITTI dataset demonstrate the effectiveness and efficiency of our approach, achieving a better compromise in terms of speed and accuracy.

引用

页码：14600 / 14609

页数：10

共 50 条

[1]

[Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00327

[2]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00210

[3]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.01298

[4]

[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00752

[5]

[Anonymous], 2020, CVPR, DOI DOI 10.1109/CVPR42600.2020.00550

[6]

[Anonymous], 2016, ICML

[7]

[Anonymous], 2016, ICML

[8]

[Anonymous], 2022, ternational

[9] SemanticKITTI: A Dataset for Semantic Scene Understanding of LiDAR Sequences [J].

Behley, Jens ;

Garbade, Martin ;

Milioto, Andres ;

Quenzel, Jan ;

Behnke, Sven ;

Stachniss, Cyrill ;

Gall, Juergen .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9296-9306

[10]

Cai MY, 2017, 2017 IEEE INTERNATIONAL SYMPOSIUM ON SYSTEMS ENGINEERING (ISSE 2017), P6, DOI 10.1109/SysEng.2017.8088250

← 1 2 3 4 5 →