Ground-Aware Monocular 3D Object Detection for Autonomous Driving

被引:103
作者
Liu, Yuxuan [1 ]
Yixuan, Yuan [2 ]
Liu, Ming [1 ]
机构
[1] Hong Kong Univ Sci & Technol, Dept Elect & Comp Engn, Robot & Multipercept Lab, Hong Kong, Peoples R China
[2] City Univ Hong Kong, Dept Elect Engn, Hong Kong, Peoples R China
关键词
Three-dimensional displays; Cameras; Object detection; Two dimensional displays; Feature extraction; Convolution; Neural networks; Automation technologies for smart cities; deep learning for visual perception; object detection; segmentation and categorization;
D O I
10.1109/LRA.2021.3052442
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Estimating the 3D position and orientation of objects in the environment with a single RGB camera is a critical and challenging task for low-cost urban autonomous driving and mobile robots. Most of the existing algorithms are based on the geometric constraints in 2D-3D correspondence, which stems from generic 6D object pose estimation. We first identify how the ground plane provides additional clues in depth reasoning in 3D detection in driving scenes. Based on this observation, we then improve the processing of 3D anchors and introduce a novel neural network module to fully utilize such application-specific priors in the framework of deep learning. Finally, we introduce an efficient neural network embedded with the proposed module for 3D object detection. We further verify the power of the proposed module with a neural network designed for monocular depth prediction. The two proposed networks achieve state-of-the-art performances on the KITTI 3D object detection and depth prediction benchmarks, respectively.
引用
收藏
页码:919 / 926
页数:8
相关论文
共 40 条
[1]  
Aich S., 2020, COMPUT VIS PATTERN R
[2]   M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [J].
Brazil, Garrick ;
Liu, Xiaoming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9286-9295
[3]  
Chen XZ, 2015, ADV NEUR IN, V28
[4]  
Chen ZY, 2019, IEEE INT VEH SYM, P1408, DOI [10.1109/IVS.2019.8813818, 10.1109/ivs.2019.8813818]
[5]  
Chun C, 2013, IEEE IMAGE PROC, P3358, DOI 10.1109/ICIP.2013.6738692
[6]  
DAZ R, 2019, P IEEE CVF C COMP VI, P4733
[7]  
Ding Mingyu, 2019, Learning depth-guided convolutions for monocular 3d object detection
[8]   The Pascal Visual Object Classes (VOC) Challenge [J].
Everingham, Mark ;
Van Gool, Luc ;
Williams, Christopher K. I. ;
Winn, John ;
Zisserman, Andrew .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2010, 88 (02) :303-338
[9]   Deep Ordinal Regression Network for Monocular Depth Estimation [J].
Fu, Huan ;
Gong, Mingming ;
Wang, Chaohui ;
Batmanghelich, Kayhan ;
Tao, Dacheng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011
[10]  
Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074