3D object detection: Learning 3D bounding boxes from scaled down 2D bounding boxes in RGB-D images

被引:31
作者
Rahman, Mohammad Muntasir [1 ,2 ]
Tan, Yanhao [1 ]
Xue, Jian [1 ]
Shao, Ling [3 ]
Lu, Ke [1 ]
机构
[1] Univ Chinese Acad Sci, Sch Engn Sci, 19A Yuquan Rd, Beijing 100049, Peoples R China
[2] Islamic Univ, Dept Comp Sci & Engn, Kushtia 7003, Bangladesh
[3] Incept Inst Artificial Intelligence, Abu Dhabi, U Arab Emirates
基金
北京市自然科学基金; 国家重点研发计划; 中国国家自然科学基金;
关键词
3D object detection; RGB-D data; Deep neural networks; Multi-modal region proposal networks; Deep feature learning;
D O I
10.1016/j.ins.2018.09.040
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
3D object detection in RGB-D images is a vast growing research area in computer vision. In this paper, we study the problems of amodal 3D object detection in RGB-D images and present an efficient 3D object detection system that can predict object location, size, and orientation. Unlike existing methods that either uses multistage point cloud processing or pre-computed segmentation mask to generate the 3D bounding boxes, we only leverage 2D region proposals for this task. Given a pair of color and depth image as input, we first predict 2D region proposals from the designed multimodal fusion region proposal networks and then we propose an efficient method to generate 3D bounding boxes from those region proposals by scaling down the 2D bounding boxes with a scale factor and project it to 3D space. We evaluate our system on challenging NYUv2 and SUN RGB-D dataset and compare with the state-of-the-art detection methods. The experimental results show that our method outperforms the state-of-the-art by a remarkable margin with faster detection time. We achieve the best results on the NYUv2 dataset on a 19-class object detection task while performing comparably faster detection performances on the SUN RGB-D dataset on a 10-class object detection task. (C) 2018 Published by Elsevier Inc.
引用
收藏
页码:147 / 158
页数:12
相关论文
共 49 条
[1]  
[Anonymous], IEEE I CONF COMP VIS
[2]  
[Anonymous], 2012, Advances in neural information processing systems (NIPS)
[3]  
[Anonymous], 2017, IEEE I CONF COMP VIS, DOI DOI 10.1109/ICCV.2017.322
[4]  
[Anonymous], 2015, PROC CVPR IEEE
[5]  
[Anonymous], PROC CVPR IEEE
[6]  
[Anonymous], ADV NEURAL INFORM PR
[7]  
[Anonymous], 2017, 31 INT CONFNEURAL IN
[8]   Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [J].
Bell, Sean ;
Zitnick, C. Lawrence ;
Bala, Kavita ;
Girshick, Ross .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :2874-2883
[9]  
Blum M, 2012, IEEE INT CONF ROBOT, P1298, DOI 10.1109/ICRA.2012.6225188
[10]  
Bo L., 2012, P INT S EXP ROB ISER, P387, DOI DOI 10.1007/978-3-319-00065-7_27