Improving Depth Estimation by Embedding Semantic Segmentation: A Hybrid CNN Model

被引:13
作者
Valdez-Rodriguez, Jose E. [1 ]
Calvo, Hiram [1 ]
Felipe-Riveron, Edgardo [1 ]
Moreno-Armendariz, Marco A. [1 ]
机构
[1] Inst Politecn Nacl, Ctr Invest Comp, Av Juan de Dios Batiz S-N, Ciudad De Mexico 07738, Mexico
关键词
depth estimation; hybrid convolutional neural networks; semantic segmentation; 3D CNN;
D O I
10.3390/s22041669
中图分类号
O65 [分析化学];
学科分类号
070302 ; 081704 ;
摘要
Single image depth estimation works fail to separate foreground elements because they can easily be confounded with the background. To alleviate this problem, we propose the use of a semantic segmentation procedure that adds information to a depth estimator, in this case, a 3D Convolutional Neural Network (CNN)-segmentation is coded as one-hot planes representing categories of objects. We explore 2D and 3D models. Particularly, we propose a hybrid 2D-3D CNN architecture capable of obtaining semantic segmentation and depth estimation at the same time. We tested our procedure on the SYNTHIA-AL dataset and obtained sigma(3)=0.95, which is an improvement of 0.14 points (compared with the state of the art of sigma(3)=0.81) by using manual segmentation, and sigma(3)=0.89 using automatic semantic segmentation, proving that depth estimation is improved when the shape and position of objects in a scene are known.
引用
收藏
页数:20
相关论文
共 38 条
[1]  
Afifi A.J., 2016, INT C DIGITAL IMAGE, P1
[2]  
Arora R., 2016, 161101491 ARXIV
[3]   To complete or to estimate, that is the question: A Multi-Task Approach to Depth Completion and Monocular Depth Estimation [J].
Atapour-Abarghouei, Amir ;
Breckon, Toby P. .
2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, :183-193
[4]   Temporal Coherence for Active Learning in Videos [J].
Bengar, Javad Zolfaghari ;
Gonzalez-Garcia, Abel ;
Villalonga, Gabriel ;
Raducanu, Bogdan ;
Aghdam, Hamed H. ;
Mozerov, Mikhail ;
Lopez, Antonio M. ;
van de Weijer, Joost .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :914-923
[5]  
Blake R., 2006, Perception, Vfifth
[6]  
Chen T., 2015, 151201274 ARXIV
[7]  
Chollet F., 2015, Keras
[8]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[9]  
Eigen D., P 27 INT C NEUR INF, P2366
[10]   Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].
Eigen, David ;
Fergus, Rob .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658