Monocular image depth prediction without depth sensors: An unsupervised learning method

被引：6

作者：

Chen, Songnan ^{[1
,2
,3
]}

Tang, Mengxia ^{[1
,3
]}

Kan, Jiangming ^{[1
,3
]}

机构：

[1] Beijing Forestry Univ, Sch Technol, 35 Qinghua East Rd, Beijing 100083, Peoples R China

[2] Xinyang Agr & Forestry Univ, Sch Informat Engn, 1 North Circular Rd, Xinyang 464000, Henan, Peoples R China

[3] Key Lab State Forestry Adm Forestry Equipment & A, 35 Qinghua East Rd, Beijing 100083, Peoples R China

来源：

APPLIED SOFT COMPUTING | 2020年 / 97卷

基金：

中国国家自然科学基金;

关键词：

Monocular image; Depth prediction; Binocular stereo vision; Unsupervised method; VISIBILITY; NETWORKS;

D O I：

10.1016/j.asoc.2020.106804

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Monocular image depth prediction is an interesting challenge in three-dimensional (3D) perception, the purpose of which is to obtain the geometric features of 3D scenes from two-dimensional (2D) images. At present, the deep learning method for monocular depth prediction has yielded good results, but this approach treats it as a supervised deep regression problem. A significant weakness of current methods is the need to collect reams of depth measurement data in actual scenarios for training. In this paper, we design a novel convolutional neural network (CNN) with an encoding and decoding structure to estimate the depth map from monocular RGB images based on basic principles of binocular stereo vision, and use rectified stereo pairs to train our network from scratch in an unsupervised learning method without any depth data. We also explore a new upsampling strategy to improve the output resolution, and introduce a new dynamic optimization strategy to enhance the training speed and prediction accuracy. Extensive experiments on the publicly available KITTI and Cityscapes datasets demonstrate that our approach is more accurate than competing methods. The findings of the proposed methodology illustrate that our CNN model can be utilized as depth completion from LIDAR images. (C) 2020 Elsevier B.V. All rights reserved.

引用

页数：15

共 62 条

[1]

[Anonymous], Conf. Comput. Vis. (ICCV)

[2]

[Anonymous], 2006, 2006 IEEE COMP SOC C

[3]

[Anonymous], 2006, Advances in Neural Information Processing Systems, DOI [10.1109/TPAMI.2015.2505283a, DOI 10.1109/TPAMI.2015.2505283A]

[4]

[Anonymous], 2012, 2012 IEEE COMP SOC C

[5]

Bian JW, 2019, ADV NEUR IN, V32

[6] PatchMatch Stereo - Stereo Matching with Slanted Support Windows [J].

Bleyer, Michael ;

Rhemann, Christoph ;

Rother, Carsten .

PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2011, 2011,

[7] Towards Scene Understanding: Unsupervised Monocular Depth Estimation with Semantic-aware Representation [J].

Chen, Po-Yi ;

Liu, Alexander H. ;

Liu, Yen-Cheng ;

Wang, Yu-Chiang Frank .

2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2619-2627

[8] Encoder-decoder with densely convolutional networks for monocular depth estimation [J].

Chen, Songnan ;

Tang, Mengxia ;

Kan, Jiangming .

JOURNAL OF THE OPTICAL SOCIETY OF AMERICA A-OPTICS IMAGE SCIENCE AND VISION, 2019, 36 (10) :1709-1718

[9] Predicting Depth from Single RGB Images with Pyramidal Three-Streamed Networks [J].

Chen, Songnan ;

Tang, Mengxia ;

Kan, Jiangming .

SENSORS, 2019, 19 (03)

[10]

Clevert D.-A., 2016, ARXIV

← 1 2 3 4 5 6 7 →