ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes

被引:213
作者
Qi, Charles R.
Chen, Xinlei [1 ]
Litany, Or [1 ,2 ]
Guibas, Leonidas J. [1 ,2 ]
机构
[1] Facebook AI, Menlo Pk, CA 94025 USA
[2] Stanford Univ, Stanford, CA 94305 USA
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR) | 2020年
关键词
HOUGH TRANSFORM; DATABASE;
D O I
10.1109/CVPR42600.2020.00446
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
3D object detection has seen quick progress thanks to advances in deep learning on point clouds. A few recent works have even shown state-of-the-art performance with just point clouds input (e.g. VOTENET). However, point cloud data have inherent limitations. They are sparse, lack color information and often suffer from sensor noise. Images, on the other hand, have high resolution and rich texture. Thus they can complement the 3D geometry provided by point clouds. Yet how to effectively use image information to assist point cloud based detection is still an open question. In this work, we build on top of VOTENET and propose a 3D detection architecture called IMVOTENET specialized for RGB-D scenes. IMVOTENET is based on fusing 2D votes in images and 3D votes in point clouds. Compared to prior work on multi-modal detection, we explicitly extract both geometric and semantic features from the 2D images. We leverage camera parameters to lift these features to 3D. To improve the synergy of 2D-3D feature fusion, we also propose a multi-tower training scheme. We validate our model on the challenging SUN RGB-D dataset, advancing state-of-the-art results by 5.7 mAP. We also provide rich ablation studies to analyze the contribution of each design choice.
引用
收藏
页码:4403 / 4412
页数:10
相关论文
共 56 条
[1]   VQA: Visual Question Answering [J].
Agrawal, Aishwarya ;
Lu, Jiasen ;
Antol, Stanislaw ;
Mitchell, Margaret ;
Zitnick, C. Lawrence ;
Parikh, Devi ;
Batra, Dhruv .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2017, 123 (01) :4-31
[2]  
[Anonymous], 2016, PROC C EMPIRICAL MET
[3]  
[Anonymous], 2018, IEEE T NEUR NET LEAR, DOI DOI 10.1109/TNNLS.2018.2817340
[4]  
[Anonymous], 2018, ECCV
[5]  
[Anonymous], ONATI INT SER LAW SO
[6]  
[Anonymous], 2016, P COMPUTER VISION EC, DOI DOI 10.1007/978-3-319-46448-0_2
[7]  
Avetisyan Armen, 2019, CVPR
[8]   GENERALIZING THE HOUGH TRANSFORM TO DETECT ARBITRARY SHAPES [J].
BALLARD, DH .
PATTERN RECOGNITION, 1981, 13 (02) :111-122
[9]   4D Spatio-Temporal ConvNets: Minkowski Convolutional Neural Networks [J].
Choy, Christopher ;
Gwak, JunYoung ;
Savarese, Silvio .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3070-3079
[10]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554