Two-Stream Based Multi-Stage Hybrid Decoder for Self-Supervised Multi-Frame Monocular Depth

被引:7
作者
Long, Yangqi [1 ]
Yu, Huimin [1 ,2 ,3 ,4 ]
Liu, Biyang [1 ]
机构
[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou 310027, Peoples R China
[2] Zhejiang Univ, ZJU League Res & Dev Ctr, Hangzhou 310027, Zhejiang, Peoples R China
[3] Zhejiang Univ, State Key Lab CAD&CG, Hangzhou 310027, Zhejiang, Peoples R China
[4] Zhejiang Prov Key Lab Informat Proc Commun & Netw, Hangzhou, Peoples R China
关键词
Deep learning for visual perception; deep learning methods; visual learning;
D O I
10.1109/LRA.2022.3214787
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Self-supervised depth estimation has attracted a lot of attention recently due to its low cost. Despite using the self-supervision from image sequences, the current single-image based methods only infer depth from the scene information ignoring the matching information which is also important. Nevertheless, the matching information is not always reliable, especially in the texture-less and occlusion regions. Thus it would be attractive to combine the strength of single-image scene information and multi-frame matching information. In this letter, we propose a two-stream based multi-stage hybrid decoder to effectively accomplish the integration procedure. The hybrid decoder consists of two pathways for these two kinds of information respectively, and interactively fuses them. Specifically, a cost volume is built based on the scene prior to represent the matching information, and feeds back to the single-image pathway to complete the integration. To further facilitate the interactive integration, a multi-stage fusion strategy is embedded seamlessly into the hybrid decoder, resulting in more accurate depth results. Our approach outperforms the existing self-supervised methods on the KITTI and Cityscapes datasets.
引用
收藏
页码:12291 / 12298
页数:8
相关论文
共 45 条
[1]   Unsupervised Scale-Consistent Depth Learning from Video [J].
Bian, Jia-Wang ;
Zhan, Huangying ;
Wang, Naiyan ;
Li, Zhichao ;
Zhang, Le ;
Shen, Chunhua ;
Cheng, Ming-Ming ;
Reid, Ian .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (09) :2548-2564
[2]  
Casser V, 2019, AAAI CONF ARTIF INTE, P8001
[3]   Self-supervised Learning with Geometric Constraints in Monocular Video Connecting Flow, Depth, and Camera [J].
Chen, Yuhua ;
Schmid, Cordelia ;
Sminchisescu, Cristian .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7062-7071
[4]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[5]   MVS2: Deep Unsupervised Multi-view Stereo with Multi-View Symmetry [J].
Dai, Yuchao ;
Zhu, Zhidong ;
Rao, Zhibo ;
Li, Bo .
2019 INTERNATIONAL CONFERENCE ON 3D VISION (3DV 2019), 2019, :1-8
[6]   Predicting Depth, Surface Normals and Semantic Labels with a Common Multi-Scale Convolutional Architecture [J].
Eigen, David ;
Fergus, Rob .
2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :2650-2658
[7]   Deep Ordinal Regression Network for Monocular Depth Estimation [J].
Fu, Huan ;
Gong, Mingming ;
Wang, Chaohui ;
Batmanghelich, Kayhan ;
Tao, Dacheng .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :2002-2011
[8]   Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue [J].
Garg, Ravi ;
VijayKumar, B. G. ;
Carneiro, Gustavo ;
Reid, Ian .
COMPUTER VISION - ECCV 2016, PT VIII, 2016, 9912 :740-756
[9]   Vision meets robotics: The KITTI dataset [J].
Geiger, A. ;
Lenz, P. ;
Stiller, C. ;
Urtasun, R. .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) :1231-1237
[10]   Digging Into Self-Supervised Monocular Depth Estimation [J].
Godard, Clement ;
Mac Aodha, Oisin ;
Firman, Michael ;
Brostow, Gabriel .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :3827-3837