Improving Depth Estimation by Embedding Semantic Segmentation: A Hybrid CNN Model

被引：13

作者：

Valdez-Rodriguez, Jose E. ^{[1
]}

Calvo, Hiram ^{[1
]}

Felipe-Riveron, Edgardo ^{[1
]}

Moreno-Armendariz, Marco A. ^{[1
]}

机构：

[1] Inst Politecn Nacl, Ctr Invest Comp, Av Juan de Dios Batiz S-N, Ciudad De Mexico 07738, Mexico

来源：

SENSORS | 2022年 / 22卷 / 04期

关键词：

depth estimation; hybrid convolutional neural networks; semantic segmentation; 3D CNN;

D O I：

10.3390/s22041669

中图分类号：

O65 [分析化学];

学科分类号：

070302 ; 081704 ;

摘要：

Single image depth estimation works fail to separate foreground elements because they can easily be confounded with the background. To alleviate this problem, we propose the use of a semantic segmentation procedure that adds information to a depth estimator, in this case, a 3D Convolutional Neural Network (CNN)-segmentation is coded as one-hot planes representing categories of objects. We explore 2D and 3D models. Particularly, we propose a hybrid 2D-3D CNN architecture capable of obtaining semantic segmentation and depth estimation at the same time. We tested our procedure on the SYNTHIA-AL dataset and obtained sigma(3)=0.95, which is an improvement of 0.14 points (compared with the state of the art of sigma(3)=0.81) by using manual segmentation, and sigma(3)=0.89 using automatic semantic segmentation, proving that depth estimation is improved when the shape and position of objects in a scene are known.

引用

页数：20

共 38 条

[1]

Afifi A.J., 2016, INT C DIGITAL IMAGE, P1

[2]

Arora R., 2016, 161101491 ARXIV

[3] To complete or to estimate, that is the question: A Multi-Task Approach to Depth Completion and Monocular Depth Estimation [J].

Atapour-Abarghouei, Amir ;

Breckon, Toby P. .

2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, :183-193

[4] Temporal Coherence for Active Learning in Videos [J].

Bengar, Javad Zolfaghari ;

Gonzalez-Garcia, Abel ;

Villalonga, Gabriel ;

Raducanu, Bogdan ;

Aghdam, Hamed H. ;

Mozerov, Mikhail ;

Lopez, Antonio M. ;

van de Weijer, Joost .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION WORKSHOPS (ICCVW), 2019, :914-923

[5]

Blake R., 2006, Perception, Vfifth

[6]

Chen T., 2015, 151201274 ARXIV

[7]

Chollet F., 2015, Keras

[8] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[9]

Eigen D., P 27 INT C NEUR INF, P2366

[10] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].

Eigen, David ;

Fergus, Rob .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658

← 1 2 3 4 →