What You See is What You Get: Exploiting Visibility for 3D Object Detection

被引:27
作者
Hu, Peiyun [1 ]
Ziglar, Jason [2 ]
Held, David [1 ]
Ramanan, Deva [1 ,2 ]
机构
[1] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
[2] Argo AI, Pittsburgh, PA USA
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020) | 2020年
关键词
RECOGNITION; IMAGES;
D O I
10.1109/CVPR42600.2020.01101
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recent advances in 3D sensing have created unique challenges for computer vision. One fundamental challenge is finding a good representation for 3D sensor data. Most popular representations (such as PointNet) are proposed in the context of processing truly 3D data (e.g. points sampled from mesh models), ignoring the fact that 3D sensored data such as a LiDAR sweep is in fact 2.5D. We argue that representing 2.5D data as collections of (x, y, z) points fundamentally destroys hidden information about freespace. In this paper, we demonstrate such knowledge can be efficiently recovered through 3D raycasting and readily incorporated into batch-based gradient learning. We describe a simple approach to augmenting voxel-based networks with visibility: we add a voxelized visibility map as an additional input stream. In addition, we show that visibility can be combined with two crucial modifications common to state-of-the-art 3D detectors: synthetic data augmentation of virtual objects and temporal aggregation of LiDAR sweeps over multiple time frames. On the NuScenes 3D detection benchmark, we show that, by adding an additional stream for visibility input, we can significantly improve the overall detection accuracy of a state-of-the-art 3D detector.
引用
收藏
页码:10998 / 11006
页数:9
相关论文
共 31 条
[1]  
Amanatides John, 1987, EG 1987
[2]  
[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.01298
[3]   Geometric Deep Learning Going beyond Euclidean data [J].
Bronstein, Michael M. ;
Bruna, Joan ;
LeCun, Yann ;
Szlam, Arthur ;
Vandergheynst, Pierre .
IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (04) :18-42
[4]  
BUHMANN J, 1995, AI MAG, V16, P31
[5]   nuScenes: A multimodal dataset for autonomous driving [J].
Caesar, Holger ;
Bankiti, Varun ;
Lang, Alex H. ;
Vora, Sourabh ;
Liong, Venice Erin ;
Xu, Qiang ;
Krishnan, Anush ;
Pan, Yu ;
Baldan, Giancarlo ;
Beijbom, Oscar .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11618-11628
[6]  
Choy Christopher, 2019, ARXIV
[7]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[8]  
Eitel A, 2015, IEEE INT C INT ROBOT, P681, DOI 10.1109/IROS.2015.7353446
[9]   Vision meets robotics: The KITTI dataset [J].
Geiger, A. ;
Lenz, P. ;
Stiller, C. ;
Urtasun, R. .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) :1231-1237
[10]   OctoMap: an efficient probabilistic 3D mapping framework based on octrees [J].
Hornung, Armin ;
Wurm, Kai M. ;
Bennewitz, Maren ;
Stachniss, Cyrill ;
Burgard, Wolfram .
AUTONOMOUS ROBOTS, 2013, 34 (03) :189-206