YOLO MDE: Object Detection with Monocular Depth Estimation

被引:11
作者
Yu, Jongsub [1 ]
Choi, Hyukdoo [1 ]
机构
[1] Soonchunhyang Univ, Dept Elect Mat & Devices Engn, Asan 31538, South Korea
关键词
object detection; depth estimation; deep learning;
D O I
10.3390/electronics11010076
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents an object detector with depth estimation using monocular camera images. Previous detection studies have typically focused on detecting objects with 2D or 3D bounding boxes. A 3D bounding box consists of the center point, its size parameters, and heading information. However, predicting complex output compositions leads a model to have generally low performances, and it is not necessary for risk assessment for autonomous driving. We focused on predicting a single depth per object, which is essential for risk assessment for autonomous driving. Our network architecture is based on YOLO v4, which is a fast and accurate one-stage object detector. We added an additional channel to the output layer for depth estimation. To train depth prediction, we extract the closest depth from the 3D bounding box coordinates of ground truth labels in the dataset. Our model is compared with the latest studies on 3D object detection using the KITTI object detection benchmark. As a result, we show that our model achieves higher detection performance and detection speed than existing models with comparable depth accuracy.
引用
收藏
页数:10
相关论文
共 35 条
[1]  
Bochokvskiy A, YOLOV4 OPTIMAL SPEED
[2]   M3D-RPN: Monocular 3D Region Proposal Network for Object Detection [J].
Brazil, Garrick ;
Liu, Xiaoming .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :9286-9295
[3]   Multi-View 3D Object Detection Network for Autonomous Driving [J].
Chen, Xiaozhi ;
Ma, Huimin ;
Wan, Ji ;
Li, Bo ;
Xia, Tian .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6526-6534
[4]   Monocular 3D Object Detection for Autonomous Driving [J].
Chen, Xiaozhi ;
Kundu, Kaustav ;
Zhang, Ziyu ;
Ma, Huimin ;
Fidler, Sanja ;
Urtasun, Raquel .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2147-2156
[5]   Learning Depth-Guided Convolutions for Monocular 3D Object Detection [J].
Ding, Mingyu ;
Huo, Yuqi ;
Yi, Hongwei ;
Wang, Zhe ;
Shi, Jianping ;
Lu, Zhiwu ;
Luo, Ping .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :11669-11678
[6]   CenterNet: Keypoint Triplets for Object Detection [J].
Duan, Kaiwen ;
Bai, Song ;
Xie, Lingxi ;
Qi, Honggang ;
Huang, Qingming ;
Tian, Qi .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :6568-6577
[7]   Vision meets robotics: The KITTI dataset [J].
Geiger, A. ;
Lenz, P. ;
Stiller, C. ;
Urtasun, R. .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) :1231-1237
[8]   Fast R-CNN [J].
Girshick, Ross .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :1440-1448
[9]   Rich feature hierarchies for accurate object detection and semantic segmentation [J].
Girshick, Ross ;
Donahue, Jeff ;
Darrell, Trevor ;
Malik, Jitendra .
2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, :580-587
[10]  
Huber P.J., 1992, Robust Estimation of a Location Parameter Breakthroughs in Statistics, V492, P518, DOI [10.1007/978-1-4612-4380-935, DOI 10.1007/978-1-4612-4380-935]