Self-supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance

被引:198
作者
Klingner, Marvin [1 ]
Termoehlen, Jan-Aike [1 ]
Mikolajczyk, Jonas [1 ]
Fingscheidt, Tim [1 ]
机构
[1] Tech Univ Carolo Wilhelmina Braunschweig, Braunschweig, Germany
来源
COMPUTER VISION - ECCV 2020, PT XX | 2020年 / 12365卷
关键词
D O I
10.1007/978-3-030-58565-5_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Self-supervised monocular depth estimation presents a powerful method to obtain 3D scene information from single camera images, which is trainable on arbitrary image sequences without requiring depth labels, e.g., from a LiDAR sensor. In this work we present a new self-supervised semantically-guided depth estimation (SGDepth) method to deal with moving dynamic-class (DC) objects, such as moving cars and pedestrians, which violate the static-world assumptions typically made during training of such models. Specifically, we propose (i) mutually beneficial cross-domain training of (supervised) semantic segmentation and self-supervised depth estimation with task-specific network heads, (ii) a semantic masking scheme providing guidance to prevent moving DC objects from contaminating the photometric loss, and (iii) a detection method for frames with non-moving DC objects, from which the depth of DC objects can be learned. We demonstrate the performance of our method on several benchmarks, in particular on the Eigen split, where we exceed all baselines without test-time refinement.
引用
收藏
页码:582 / 600
页数:19
相关论文
共 70 条
[1]  
Akhter I., 2009, P NIPS, P41
[2]   Generative Adversarial Networks for Unsupervised Monocular Depth Prediction [J].
Aleotti, Filippo ;
Tosi, Fabio ;
Poggi, Matteo ;
Mattoccia, Stefano .
COMPUTER VISION - ECCV 2018 WORKSHOPS, PT I, 2019, 11129 :337-354
[3]  
Bolte Jan-Aike, 2019, SOURCE TARGET DOMAIN, P1404, DOI [10.1109/CVPRW.2019.00181, DOI 10.1109/CVPRW.2019.00181]
[4]   SynDeMo: Synergistic Deep Feature Alignment for Joint Learning of Depth and Ego-Motion [J].
Bozorgtabar, Behzad ;
Rad, Mohammad Saeed ;
Mahapatra, Dwarikanath ;
Thiran, Jean-Philippe .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :4209-4218
[5]   Unsupervised monocular depth and ego-motion learning with structure and semantics [J].
Casser, Vincent ;
Pirk, Soeren ;
Mahjourian, Reza ;
Angelova, Anelia .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2019), 2019, :381-388
[6]  
Casser V, 2019, AAAI CONF ARTIF INTE, P8001
[7]   Towards Scene Understanding: Unsupervised Monocular Depth Estimation with Semantic-aware Representation [J].
Chen, Po-Yi ;
Liu, Alexander H. ;
Liu, Yen-Cheng ;
Wang, Yu-Chiang Frank .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :2619-2627
[8]   Self-supervised Learning with Geometric Constraints in Monocular Video Connecting Flow, Depth, and Camera [J].
Chen, Yuhua ;
Schmid, Cordelia ;
Sminchisescu, Cristian .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7062-7071
[9]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[10]  
Eigen D, 2014, ADV NEUR IN, V27