Two-Stream Based Multi-Stage Hybrid Decoder for Self-Supervised Multi-Frame Monocular Depth

被引：7

作者：

Long, Yangqi ^{[1
]}

Yu, Huimin ^{[1
,2
,3
,4
]}

Liu, Biyang ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou 310027, Peoples R China

[2] Zhejiang Univ, ZJU League Res & Dev Ctr, Hangzhou 310027, Zhejiang, Peoples R China

[3] Zhejiang Univ, State Key Lab CAD&CG, Hangzhou 310027, Zhejiang, Peoples R China

[4] Zhejiang Prov Key Lab Informat Proc Commun & Netw, Hangzhou, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2022年 / 7卷 / 04期

关键词：

Deep learning for visual perception; deep learning methods; visual learning;

D O I：

10.1109/LRA.2022.3214787

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Self-supervised depth estimation has attracted a lot of attention recently due to its low cost. Despite using the self-supervision from image sequences, the current single-image based methods only infer depth from the scene information ignoring the matching information which is also important. Nevertheless, the matching information is not always reliable, especially in the texture-less and occlusion regions. Thus it would be attractive to combine the strength of single-image scene information and multi-frame matching information. In this letter, we propose a two-stream based multi-stage hybrid decoder to effectively accomplish the integration procedure. The hybrid decoder consists of two pathways for these two kinds of information respectively, and interactively fuses them. Specifically, a cost volume is built based on the scene prior to represent the matching information, and feeds back to the single-image pathway to complete the integration. To further facilitate the interactive integration, a multi-stage fusion strategy is embedded seamlessly into the hybrid decoder, resulting in more accurate depth results. Our approach outperforms the existing self-supervised methods on the KITTI and Cityscapes datasets.

引用

页码：12291 / 12298

页数：8

共 45 条

[1] Unsupervised Scale-Consistent Depth Learning from Video [J].

Bian, Jia-Wang ;

Zhan, Huangying ;

Wang, Naiyan ;

Li, Zhichao ;

Zhang, Le ;

Shen, Chunhua ;

Cheng, Ming-Ming ;

Reid, Ian .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (09) :2548-2564

[2]

Casser V, 2019, AAAI CONF ARTIF INTE, P8001

[3] Self-supervised Learning with Geometric Constraints in Monocular Video Connecting Flow, Depth, and Camera [J].

Chen, Yuhua ;

Schmid, Cordelia ;

Sminchisescu, Cristian .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7062-7071

[4] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[5] MVS2: Deep Unsupervised Multi-view Stereo with Multi-View Symmetry [J].

Dai, Yuchao ;

Zhu, Zhidong ;

Rao, Zhibo ;

Li, Bo .

2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, :1-8

[6] Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].

Eigen, David ;

Fergus, Rob .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658

[7] Deep Ordinal Regression Network for Monocular Depth Estimation [J].

Fu, Huan ;

Gong, Mingming ;

Wang, Chaohui ;

Batmanghelich, Kayhan ;

Tao, Dacheng .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011

[8] Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue [J].

Garg, Ravi ;

VijayKumar, B. G. ;

Carneiro, Gustavo ;

Reid, Ian .

COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 :740-756

[9] Vision meets robotics: The KITTI dataset [J].

Geiger, A. ;

Lenz, P. ;

Stiller, C. ;

Urtasun, R. .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) :1231-1237

[10] Digging Into Self-Supervised Monocular Depth Estimation [J].

Godard, Clement ;

Mac Aodha, Oisin ;

Firman, Michael ;

Brostow, Gabriel .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3827-3837

← 1 2 3 4 5 →