Semisupervised learning-based depth estimation with semantic inference guidance

被引:8
作者
Zhang Yan [1 ]
Fan XiaoPeng [1 ]
Zhao DeBin [1 ]
机构
[1] Harbin Inst Technol, Dept Comp Sci & Technol, Harbin 150001, Peoples R China
基金
国家高技术研究发展计划(863计划); 中国国家自然科学基金;
关键词
depth estimation; semisupervised learning; semantic information; neural networks;
D O I
10.1007/s11431-021-1948-3
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Depth estimation is a fundamental computer vision problem that infers three-dimensional (3D) structures from a given scene. As it is an ill-posed problem, to fit the projection function from the given scene to the 3D structure, traditional methods generally require mass amounts of annotated data. Such pixel-level annotation is quite labor consuming, especially when addressing reflective surfaces such as mirrors or water. The widespread application of deep learning further intensifies the demand for large amounts of annotated data. Therefore, it is urgent and necessary to propose a framework that is able to reduce the requirement on the amount of data. In this paper, we propose a novel semisupervised learning framework to infer the 3D structure from the given scene. First, semantic information is employed to make the depth inference more accurate. Second, we make both the depth estimation and semantic segmentation coarse-to-fine frameworks; thus, the depth estimation can be gradually guided by semantic segmentation. We compare our model with state-of-the-art methods. The experimental results demonstrate that our method is better than many supervised learning-based methods, which proves the effectiveness of the proposed method.
引用
收藏
页码:1098 / 1106
页数:9
相关论文
共 57 条
[1]  
Atapour-Abarghouei A., 2019, P IEEE C COMP VIS PA, P3373
[2]  
Baig MH., 2016, 2016 IEEE WINT C APP, P1
[3]   A theory of learning from different domains [J].
Ben-David, Shai ;
Blitzer, John ;
Crammer, Koby ;
Kulesza, Alex ;
Pereira, Fernando ;
Vaughan, Jennifer Wortman .
MACHINE LEARNING, 2010, 79 (1-2) :151-175
[4]   S3Net: Semantic-Aware Self-supervised Depth Estimation with Monocular Videos and Synthetic Data [J].
Cheng, Bin ;
Saggu, Inderjot Singh ;
Shah, Raunak ;
Bansal, Gaurav ;
Bharadia, Dinesh .
COMPUTER VISION - ECCV 2020, PT XXX, 2020, 12375 :52-69
[5]  
Chakrabarti Ayan, 2016, Advances in Neural Information Processing Systems, V29
[6]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[7]  
Eigen D, 2014, ADV NEUR IN, V27
[8]   Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].
Eigen, David ;
Fergus, Rob .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658
[9]  
Garg R., 2016, COMPUTER VISION ECCV
[10]   Digging Into Self-Supervised Monocular Depth Estimation [J].
Godard, Clement ;
Mac Aodha, Oisin ;
Firman, Michael ;
Brostow, Gabriel .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3827-3837