Revealing the Reciprocal Relations between Self-Supervised Stereo and Monocular Depth Estimation

被引：20

作者：

Chen, Zhi ^{[1
]}

Ye, Xiaoqing ^{[2
]}

Yang, Wei ^{[1
]}

Xu, Zhenbo ^{[1
]}

Tan, Xiao ^{[2
]}

Zou, Zhikang ^{[2
]}

Ding, Errui ^{[2
]}

Zhang, Xinming ^{[1
]}

Huang, Liusheng ^{[1
]}

机构：

[1] Univ Sci & Technol China, Hefei, Peoples R China

[2] Baidu Inc, Dept Comp Vis Technol VIS, Beijing, Peoples R China

来源：

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021) | 2021年

关键词：

D O I：

10.1109/ICCV48922.2021.01524

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Current self-supervised depth estimation algorithms mainly focus on either stereo or monocular only, neglecting the reciprocal relations between them. In this paper, we propose a simple yet effective framework to improve both stereo and monocular depth estimation by leveraging the underlying complementary knowledge of the two tasks. Our approach consists of three stages. In the first stage, the proposed stereo matching network termed StereoNet is trained on image pairs in a self-supervised manner. Second, we introduce an occlusion-aware distillation (OA Distillation) module, which leverages the predicted depths from StereoNet in non-occluded regions to train our monocular depth estimation network named SingleNet. At last, we design an occlusion-aware fusion module (OA Fusion), which generates more reliable depths by fusing estimated depths from StereoNet and SingleNet given the occlusion map. Furthermore, we also take the fused depths as pseudo labels to supervise StereoNet in turn, which brings StereoNet's performance to a new height. Extensive experiments on KITTI dataset demonstrate the effectiveness of our proposed framework. We achieve new SOTA performance on both stereo and monocular depth estimation tasks.

引用

页码：15509 / 15518

页数：10

共 51 条

[1]

Bello Juan Luis Gonzalez, 2020, COMPUTER VISION PATT

[2]

Bhat Shariq Farooq, 2020, ARXIV201114141

[3]

Bian Jiawang, 2019, ARXIVABS190810553

[4] Pyramid Stereo Matching Network [J].

Chang, Jia-Ren ;

Chen, Yong-Sheng .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5410-5418

[5]

Chen GB, 2017, ADV NEUR IN, V30

[6] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].

Eigen, David ;

Fergus, Rob .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658

[7] Deep Ordinal Regression Network for Monocular Depth Estimation [J].

Fu, Huan ;

Gong, Mingming ;

Wang, Chaohui ;

Batmanghelich, Kayhan ;

Tao, Dacheng .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011

[8] Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue [J].

Garg, Ravi ;

VijayKumar, B. G. ;

Carneiro, Gustavo ;

Reid, Ian .

COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 :740-756

[9]

Geiger A, 2012, PROC CVPR IEEE, P3354, DOI 10.1109/CVPR.2012.6248074

[10] Digging Into Self-Supervised Monocular Depth Estimation [J].

Godard, Clement ;

Mac Aodha, Oisin ;

Firman, Michael ;

Brostow, Gabriel .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3827-3837

← 1 2 3 4 5 6 →