SMOKE: Single-Stage Monocular 3D Object Detection via Keypoint Estimation

被引:276
作者
Liu, Zechen [1 ]
Wu, Zizhang [1 ]
Toth, Roland [2 ]
机构
[1] ZongMu Tech, Beijing, Peoples R China
[2] TU e, Beijing, Peoples R China
来源
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020) | 2020年
关键词
D O I
10.1109/CVPRW50498.2020.00506
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Estimating 3D orientation and translation of objects is essential for infrastructure-less autonomous navigation and driving. In case of monocular vision, successful methods have been mainly based on two ingredients: (i) a network generating 2D region proposals, (ii) a R-CNN structure predicting 3D object pose by utilizing the acquired regions of interest. We argue that the 2D detection network is redundant and introduces non-negligible noise for 3D detection. Hence, we propose a novel 3D object detection method, named SMOKE, in this paper that predicts a 3D bounding box for each detected object by combining a single keypoint estimate with regressed 3D variables. As a second contribution, we propose a multi-step disentangling approach for constructing the 3D bounding box, which significantly improves both training convergence and detection accuracy. In contrast to previous 3D detection techniques, our method does not require complicated pre/post-processing, extra data, and a refinement stage. Despite of its structural simplicity, our proposed SMOKE network outperforms all existing monocular 3D detection methods on the KITTI dataset, giving the best state-of-the-art result on both 3D object detection and Bird's eye view evaluation. The code is available at https://github.com/lzccccc/SMOKE.
引用
收藏
页码:4289 / 4298
页数:10
相关论文
共 44 条
[1]  
[Anonymous], 2017, P IEEE C COMP VIS PA
[2]  
[Anonymous], 2019, AAAI
[3]  
[Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00780
[4]   M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [J].
Brazil, Garrick ;
Liu, Xiaoming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9286-9295
[5]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534
[6]   Monocular 3D Object Detection for Autonomous Driving [J].
Chen, Xiaozhi ;
Kundu, Kaustav ;
Zhang, Ziyu ;
Ma, Huimin ;
Fidler, Sanja ;
Urtasun, Raquel .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2147-2156
[7]   Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis [J].
Dai, Angela ;
Qi, Charles Ruizhongtai ;
Niessner, Matthias .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6545-6554
[8]  
Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074
[9]  
Ioffe S, 2015, PR MACH LEARN RES, V37, P448
[10]   Monocular 3D Object Detection Leveraging Accurate Proposals and Shape Reconstruction [J].
Ku, Jason ;
Pon, Alex D. ;
Waslander, Steven L. .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :11859-11868