Look Deeper into Depth: Monocular Depth Estimation with Semantic Booster and Attention-Driven Loss

被引:117
作者
Jiao, Jianbo [1 ,2 ]
Cao, Ying [1 ]
Song, Yibing [3 ]
Lau, Rynson [1 ]
机构
[1] City Univ Hong Kong, Kowloon, Hong Kong, Peoples R China
[2] Univ Illinois, Urbana, IL 61801 USA
[3] Tencent AI Lab, Shenzhen, Peoples R China
来源
COMPUTER VISION - ECCV 2018, PT 15 | 2018年 / 11219卷
关键词
Monocular depth; Semantic labeling; Attention loss;
D O I
10.1007/978-3-030-01267-0_4
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Monocular depth estimation benefits greatly from learning based techniques. By studying the training data, we observe that the per-pixel depth values in existing datasets typically exhibit a long-tailed distribution. However, most previous approaches treat all the regions in the training data equally regardless of the imbalanced depth distribution, which restricts the model performance particularly on distant depth regions. In this paper, we investigate the long tail property and delve deeper into the distant depth regions (i.e. the tail part) to propose an attention-driven loss for the network supervision. In addition, to better leverage the semantic information for monocular depth estimation, we propose a synergy network to automatically learn the information sharing strategies between the two tasks. With the proposed attention-driven loss and synergy network, the depth estimation and semantic labeling tasks can bemutually improved. Experiments on the challenging indoor dataset show that the proposed approach achieves state-of-the-art performance on both monocular depth estimation and semantic labeling tasks.
引用
收藏
页码:55 / 71
页数:17
相关论文
共 56 条
  • [1] [Anonymous], 2013, CVPR
  • [2] [Anonymous], 2016, Lecture Notes in Computer Science, DOI [10.1007/978-3-319-46493-0_38, DOI 10.1007/978-3-319-46493-0_38]
  • [3] [Anonymous], 2016, 3D VISION 3DV
  • [4] Boski M, 2017, 2017 10TH INTERNATIONAL WORKSHOP ON MULTIDIMENSIONAL (ND) SYSTEMS (NDS)
  • [5] Chen W., 2016, NIPS
  • [6] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [7] Couprie C., 2013, P INT C LEARN REPR
  • [8] A Discriminative Framework for Anomaly Detection in Large Videos
    Del Giorno, Allison
    Bagnell, J. Andrew
    Hebert, Martial
    [J]. COMPUTER VISION - ECCV 2016, PT V, 2016, 9909 : 334 - 349
  • [9] Eigen C., 2014, ADV NEURAL INF PROCE, V27, P2366, DOI DOI 10.5555/2969033.2969091
  • [10] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture
    Eigen, David
    Fergus, Rob
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2650 - 2658