Two-Stream Based Multi-Stage Hybrid Decoder for Self-Supervised Multi-Frame Monocular Depth

被引：7

作者：

Long, Yangqi ^{[1
]}

Yu, Huimin ^{[1
,2
,3
,4
]}

Liu, Biyang ^{[1
]}

机构：

[1] Zhejiang Univ, Coll Informat Sci & Elect Engn, Hangzhou 310027, Peoples R China

[2] Zhejiang Univ, ZJU League Res & Dev Ctr, Hangzhou 310027, Zhejiang, Peoples R China

[3] Zhejiang Univ, State Key Lab CAD&CG, Hangzhou 310027, Zhejiang, Peoples R China

[4] Zhejiang Prov Key Lab Informat Proc Commun & Netw, Hangzhou, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2022年 / 7卷 / 04期

关键词：

Deep learning for visual perception; deep learning methods; visual learning;

D O I：

10.1109/LRA.2022.3214787

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Self-supervised depth estimation has attracted a lot of attention recently due to its low cost. Despite using the self-supervision from image sequences, the current single-image based methods only infer depth from the scene information ignoring the matching information which is also important. Nevertheless, the matching information is not always reliable, especially in the texture-less and occlusion regions. Thus it would be attractive to combine the strength of single-image scene information and multi-frame matching information. In this letter, we propose a two-stream based multi-stage hybrid decoder to effectively accomplish the integration procedure. The hybrid decoder consists of two pathways for these two kinds of information respectively, and interactively fuses them. Specifically, a cost volume is built based on the scene prior to represent the matching information, and feeds back to the single-image pathway to complete the integration. To further facilitate the interactive integration, a multi-stage fusion strategy is embedded seamlessly into the hybrid decoder, resulting in more accurate depth results. Our approach outperforms the existing self-supervised methods on the KITTI and Cityscapes datasets.

引用

页码：12291 / 12298

页数：8

共 45 条

[11] Unsupervised Monocular Depth Estimation with Left-Right Consistency [J].

Godard, Clement ;

Mac Aodha, Oisin ;

Brostow, Gabriel J. .

30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6602-6611

[12] Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras [J].

Gordon, Ariel ;

Li, Hanhan ;

Jonschkowski, Rico ;

Angelova, Anelia .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :8976-8985

[13] Cascade Cost Volume for High-Resolution Multi-View Stereo and Stereo Matching [J].

Gu, Xiaodong ;

Fan, Zhiwen ;

Zhu, Siyu ;

Dai, Zuozhuo ;

Tan, Feitong ;

Tan, Ping .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2492-2501

[14] Learning Optical Flow, Depth, and Scene Flow Without Real-World Labels [J].

Guizilini, Vitor ;

Lee, Kuan-Hui ;

Ambrus, Rares ;

Gaidon, Adrien .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (02) :3491-3498

[15] 3D Packing for Self-Supervised Monocular Depth Estimation [J].

Guizilini, Vitor ;

Ambrus, Rares ;

Pillai, Sudeep ;

Raventos, Allan ;

Gaidon, Adrien .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2482-2491

[16] DiPE: Deeper into Photometric Errors for Unsupervised Learning of Depth and Ego-motion from Monocular Videos [J].

Jiang, Hualie ;

Ding, Laiyan ;

Sun, Zhenglong ;

Huang, Rui .

2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, :10061-10067

[17] Self-supervised Monocular Trained Depth Estimation using Self-attention and Discrete Disparity Volume [J].

Johnston, Adrian ;

Carneiro, Gustavo .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4755-4764

[18] End-to-End Learning of Geometry and Context for Deep Stereo Regression [J].

Kendall, Alex ;

Martirosyan, Hayk ;

Dasgupta, Saumitro ;

Henry, Peter ;

Kennedy, Ryan ;

Bachrach, Abraham ;

Bry, Adam .

2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, :66-75

[19]

Khot T, 2019, Arxiv, DOI arXiv:1905.02706

[20] CoMoDA: Continuous Monocular Depth Adaptation Using Past Experiences [J].

Kuznietsov, Yevhen ;

Proesmans, Marc ;

Van Gool, Luc .

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :2906-2916

← 1 2 3 4 5 →