Depth estimation for monocular image based on convolutional neural networks

被引:0
作者
Niu B. [1 ]
Tang M. [2 ]
Chen X. [3 ]
机构
[1] School of information engineering, Xinyang Agriculture and Forestry University, No.1 North Circular Road, Pingqiao District, Xingyang
[2] School of technology, Beijing Forestry University, No.35 Tsinghua East Road, Haidian District, Beijing
[3] School of finance and economics, Xinyang Agriculture and Forestry University, No.1 North Circular Road, Pingqiao District, Xingyang
来源
International Journal of Circuits, Systems and Signal Processing | 2021年 / 15卷
关键词
Convolutional neural networks; Depth map; Single image; Three-dimensional structure;
D O I
10.46300/9106.2021.15.59
中图分类号
学科分类号
摘要
— Perceiving the three-dimensional structure of the surrounding environment and analyzing it for autonomous movement is an indispensable element for robots to operate in scenes. Recovering depth information and the three-dimensional spatial structure from monocular images is a basic mission of computer vision. For the objects in the image, there are many scenes that may produce it. This paper proposes to use a supervised end-to-end network to perform depth estimation without relying on any subsequent processing operations, such as probabilistic graphic models and other extra fine steps. This paper uses an encoder-decoder structure with feature pyramid to complete the prediction of dense depth maps. The encoder adopts ResNeXt-50 network to achieve main features from the original image. The feature pyramid structure can merge high and low level information with each other, and the feature information is not lost. The decoder utilizes the transposed convolutional and the convolutional layer to connect as an up-sampling structure to expand the resolution of the output. The structure adopted in this paper is applied to the indoor dataset NYU Depth v2 to obtain better prediction results than other methods. The experimental results show that on the NYU Depth v2 dataset, our method achieves the best results on 5 indicators and the sub-optimal results on 1 indicator. © 2021, North Atlantic University Union NAUN. All rights reserved.
引用
收藏
页码:533 / 540
页数:7
相关论文
共 29 条
[1]  
Ren X., Bo L., Fox D., RGB-D scene labeling: Features and algorithms, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 2759-2766, (2012)
[2]  
Shotton J., Sharp T., Kipman A., Fitzgibbon A., Finocchio M., Blake A., Cook M., Moore R., Real-time human pose recognition in parts from single depth images, Commun. ACM, 56, 1, pp. 116-124, (2013)
[3]  
Silberman N., Hoiem D., Kohli P., Fergus R., Indoor segmentation and support inference from RGBD images, European Conference on Computer Vision (ECCV, pp. 746-760, (2012)
[4]  
Taylor J., Shotton J., Sharp T., Fitzgibbon A., The vitruvian manifold: Inferring dense correspondences for one-shot human pose estimation, Proceedings of IEEE Conference on Computer Vision and Pattern Recognition, pp. 103-110, (2012)
[5]  
Chen Y, Yang D, Liao W., Efficient multi-view 3D video multicast with depth image-based rendering in LTE networks, Proceedings of the IEEE Global Communications Conference (GLOBECOM), pp. 4427-4433
[6]  
Cao Y., Xu B., Ye Z., Yang J., Cao Y., Tisse C., Li X., Depth and thermal sensor fusion to enhance 3D thermographic reconstruction, Opt. Express, 26, pp. 8179-8193, (2018)
[7]  
Ragaglia M., Zanchettin A.M., Rocco P., Trajectory generation algorithm for safe human-robot collaboration based on multiple depth sensor measurements, Mechatronics, 55, pp. 267-281, (2018)
[8]  
Wang S., Zuo X., Wang R., Cheng F., Yang R., A generative human-robot motion retargeting approach using a single depth sensor, Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), pp. 5369-5376
[9]  
Lee W., Park N., Woo W., Depth-assisted real-time 3d object detection for augmented reality, ICAT, 2, pp. 126-132, (2011)
[10]  
Memisevic R., Conrad C., Stereopsis via deep learning, NIPS Workshop on Deep Learning, 1, (2011)