Improving Depth Estimation by Embedding Semantic Segmentation: A Hybrid CNN Model

被引:11
作者
Valdez-Rodriguez, Jose E. [1 ]
Calvo, Hiram [1 ]
Felipe-Riveron, Edgardo [1 ]
Moreno-Armendariz, Marco A. [1 ]
机构
[1] Inst Politecn Nacl, Ctr Invest Comp, Av Juan de Dios Batiz S-N, Ciudad De Mexico 07738, Mexico
关键词
depth estimation; hybrid convolutional neural networks; semantic segmentation; 3D CNN;
D O I
10.3390/s22041669
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Single image depth estimation works fail to separate foreground elements because they can easily be confounded with the background. To alleviate this problem, we propose the use of a semantic segmentation procedure that adds information to a depth estimator, in this case, a 3D Convolutional Neural Network (CNN)-segmentation is coded as one-hot planes representing categories of objects. We explore 2D and 3D models. Particularly, we propose a hybrid 2D-3D CNN architecture capable of obtaining semantic segmentation and depth estimation at the same time. We tested our procedure on the SYNTHIA-AL dataset and obtained sigma(3)=0.95, which is an improvement of 0.14 points (compared with the state of the art of sigma(3)=0.81) by using manual segmentation, and sigma(3)=0.89 using automatic semantic segmentation, proving that depth estimation is improved when the shape and position of objects in a scene are known.
引用
收藏
页数:20
相关论文
共 38 条
  • [1] [Anonymous], 2016, 2016 INT C DIGITAL I
  • [2] Arora R., 2016, 161101491 ARXIV
  • [3] To complete or to estimate, that is the question: A Multi-Task Approach to Depth Completion and Monocular Depth Estimation
    Atapour-Abarghouei, Amir
    Breckon, Toby P.
    [J]. 2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, : 183 - 193
  • [4] Temporal Coherence for Active Learning in Videos
    Bengar, Javad Zolfaghari
    Gonzalez-Garcia, Abel
    Villalonga, Gabriel
    Raducanu, Bogdan
    Aghdam, Hamed H.
    Mozerov, Mikhail
    Lopez, Antonio M.
    van de Weijer, Joost
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, : 914 - 923
  • [5] Blake R., 2006, Perception, Vfifth
  • [6] Chen T., 2015, 151201274 ARXIV
  • [7] Chollet F, 2015, KERAS
  • [8] The Cityscapes Dataset for Semantic Urban Scene Understanding
    Cordts, Marius
    Omran, Mohamed
    Ramos, Sebastian
    Rehfeld, Timo
    Enzweiler, Markus
    Benenson, Rodrigo
    Franke, Uwe
    Roth, Stefan
    Schiele, Bernt
    [J]. 2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, : 3213 - 3223
  • [9] Eigen D., P 27 INT C NEUR INF, P2366
  • [10] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture
    Eigen, David
    Fergus, Rob
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2650 - 2658