Weakly supervised monocular depth estimation method based on stereo matching labels

被引：1

作者：

Zhang, Zhimin ^{[1
]}

Qiao, Jianzhong ^{[1
]}

Lin, Shukuan ^{[1
]}

Liu, Han ^{[2
]}

机构：

[1] Northeastern Univ, Sch Comp Sci & Engn, Shenyang, Peoples R China

[2] Shenzhen Univ, Coll Comp Sci & Software Engn, Shenzhen, Peoples R China

来源：

JOURNAL OF ELECTRONIC IMAGING | 2020年 / 29卷 / 05期

关键词：

monocular depth estimation; weak supervision; self-supervised learning; stereo matching; ground truth labels; COST AGGREGATION;

D O I：

10.1117/1.JEI.29.5.053013

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Current self-supervised monocular methods only learn effectively by imposing consistency constraints without relying on any geometric constraints or ground truth depth constraints, which makes the accuracy of the estimation result suboptimal. Compared with the monocular algorithm, the stereo matching network usually follows the geometric process of the traditional stereo algorithm, which makes the estimation result more accurate. Inspired by these findings, we proposed a weakly supervised monocular learning approach that makes use of the disparity maps generated by the self-supervised stereo matching model as the "ground truth" labels to train a self-supervised monocular depth estimation model. To obtain more accurate ground truth labels, we improve the layer of geometry and context in self-supervised deep stereo regression by replacing the 3D convolutional layer with a guided aggregation layer. The design can also reduce computational costs and memory consumption. Then, we build our weakly supervised monocular model by improving the U-Net model and designing a loss function composed of a weakly supervised cost and a self-supervised cost. The estimation results obtained using our model outperform those of the existing self-supervised depth estimation methods under the same training conditions on the challenging KITTI dataset, and the results can easily be generalized to the Cityscapes dataset. (C) 2020 SPIE and IS&T

引用

页数：21

共 56 条

[1] [Anonymous], 2017, IEEE CVPR 2017
[2] [Anonymous], 2017, P IEEE C COMP VIS PA
[3] [Anonymous], 2016, PROC CVPR IEEE, DOI DOI 10.1109/CVPR.2016.90
[4] [Anonymous], 2017, ARXIV170900930
[5] Byravan Arunkumar, 2017, 2017 IEEE International Conference on Robotics and Automation (ICRA), P173, DOI 10.1109/ICRA.2017.7989023
[6] Pyramid Stereo Matching Network
Chang, Jia-Ren
Chen, Yong-Sheng
[J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 5410 - 5418
[7] Chao Z., 2017, IEEE INT C COMP VIS
[8] Chen Weifeng, 2016, ADV NEURAL INFORM PR, P730, DOI DOI 10.5555/3157096.3157178
[9] The Cityscapes Dataset for Semantic Urban Scene Understanding
Cordts, Marius
Omran, Mohamed
Ramos, Sebastian
Rehfeld, Timo
Enzweiler, Markus
Benenson, Rodrigo
Franke, Uwe
Roth, Stefan
Schiele, Bernt
[J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
[10] Eigen D, 2014, ADV NEUR IN, V27

← 1 2 3 4 5 6 →