S2R-DepthNet: Learning a Generalizable Depth-specific Structural Representation

被引：22

作者：

Chen, Xiaotian ^{[1
,2
]}

Wang, Yuwang ^{[2
]}

Chen, Xuejin ^{[1
]}

Zeng, Wenjun ^{[2
]}

机构：

[1] Univ Sci & Technol China, Hefei, Peoples R China

[2] Microsoft Res Asia, Beijing, Peoples R China

来源：

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021 | 2021年

基金：

中国国家自然科学基金;

关键词：

D O I：

10.1109/CVPR46437.2021.00305

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes. We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information. Our S2R-DepthNet (Synthetic to Real DepthNet) can be well generalized to unseen real-world data directly even though it is only trained on synthetic data. S2R-DepthNet consists of: a) a Structure Extraction (STE) module which extracts a domain-invariant structural representation from an image by disentangling the image into domain-invariant structure and domain-specific style components, b) a Depth-specific Attention (DSA) module, which learns task-specific knowledge to suppress depth-irrelevant structures for better depth estimation and generalization, and c) a depth prediction module (DP) to predict depth from the depth-specific representation. Without access of any real-world images, our method even outperforms the state-of-the-art unsupervised domain adaptation methods which use real-world images of the target domain for training. In addition, when using a small amount of labeled real-world data, we achieve the state-of-the-art performance under the semi-supervised setting.

引用

页码：3033 / 3042

页数：10

共 60 条

[21]

Guizilini Vitor, 2019, ICLR

[22]

Hu Junjie, 2019, ICCV

[23]

Hu Junjie, 2019, WACV

[24] Multimodal Unsupervised Image-to-Image Translation [J].

Huang, Xun ;

Liu, Ming-Yu ;

Belongie, Serge ;

Kautz, Jan .

COMPUTER VISION - ECCV 2018, PT III, 2018, 11207 :179-196

[25] Arbitrary Style Transfer in Real-time with Adaptive Instance Normalization [J].

Huang, Xun ;

Belongie, Serge .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :1510-1519

[26]

Im S., 2019, 7 INT C LEARN REPR I, DOI DOI 10.4230/LIPICS.ICALP.2019.145

[27] Image-to-Image Translation with Conditional Adversarial Networks [J].

Isola, Phillip ;

Zhu, Jun-Yan ;

Zhou, Tinghui ;

Efros, Alexei A. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :5967-5976

[28] Style Normalization and Restitution for Generalizable Person Re-identification [J].

Jin, Xin ;

Lan, Cuiling ;

Zeng, Wenjun ;

Chen, Zhibo ;

Zhang, Li .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :3140-3149

[29]

Kazemi Hadi, 2019, WACV

[30] AdaDepth: Unsupervised Content Congruent Adaptation for Depth Estimation [J].

Kundu, Jogendra Nath ;

Uppala, Phani Krishna ;

Pahuja, Anuj ;

Babu, R. Venkatesh .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2656-2665

← 1 2 3 4 5 6 →