S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation

被引:17
作者
Chen, Xiaotian [1 ,2 ]
Wang, Yuwang [2 ]
Chen, Xuejin [1 ]
Zeng, Wenjun [2 ]
机构
[1] Univ Sci & Technol China, Hefei, Peoples R China
[2] Microsoft Res Asia, Beijing, Peoples R China
来源
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年
基金
中国国家自然科学基金;
关键词
D O I
10.1109/CVPR46437.2021.00305
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes. We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information. Our S2R-DepthNet (Synthetic to Real DepthNet) can be well generalized to unseen real-world data directly even though it is only trained on synthetic data. S2R-DepthNet consists of: a) a Structure Extraction (STE) module which extracts a domain-invariant structural representation from an image by disentangling the image into domain-invariant structure and domain-specific style components, b) a Depth-specific Attention (DSA) module, which learns task-specific knowledge to suppress depth-irrelevant structures for better depth estimation and generalization, and c) a depth prediction module (DP) to predict depth from the depth-specific representation. Without access of any real-world images, our method even outperforms the state-of-the-art unsupervised domain adaptation methods which use real-world images of the target domain for training. In addition, when using a small amount of labeled real-world data, we achieve the state-of-the-art performance under the semi-supervised setting.
引用
收藏
页码:3033 / 3042
页数:10
相关论文
共 60 条
  • [1] [Anonymous], 2015, CVPR
  • [2] [Anonymous], 2018, ECCV, DOI DOI 10.1007/978-3-030-01234-2_47
  • [3] [Anonymous], 2019, CVPR, DOI DOI 10.1109/CVPR.2019.00352
  • [4] Atapour-Abarghouei Amir, 2018, IEEE CVF C COMP VIS
  • [5] Caesar Holger, 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). Proceedings, P11618, DOI 10.1109/CVPR42600.2020.01164
  • [6] Chen X., 2019, IJCAI
  • [7] Self-supervised Learning with Geometric Constraints in Monocular Video Connecting Flow, Depth, and Camera
    Chen, Yuhua
    Schmid, Cordelia
    Sminchisescu, Cristian
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 7062 - 7071
  • [8] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [9] Eigen D, 2014, ADV NEUR IN, V27
  • [10] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture
    Eigen, David
    Fergus, Rob
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2650 - 2658