IterDepth: Iterative Residual Refinement for Outdoor Self-Supervised Multi-Frame Monocular Depth Estimation

被引:6
|
作者
Feng, Cheng [1 ]
Chen, Zhen [1 ,2 ]
Zhang, Congxuan [2 ,3 ]
Hu, Weiming [4 ]
Li, Bing [5 ]
Lu, Feng [5 ]
机构
[1] Beihang Univ, Sch Instrumentat & Optoelect Engn, Beijing 100191, Peoples R China
[2] Nanchang Hangkong Univ, Key Lab Nondestruct Testing, Minist Educ, Nanchang 330063, Peoples R China
[3] Nanchang Hangkong Univ, Sch Measuring & Opt Engn, Minist Educ, Nanchang 330063, Peoples R China
[4] Chinese Acad Sci, Inst Automat, Natl Lab Pattern Recognit, Beijing 100190, Peoples R China
[5] Nanchang Hangkong Univ, Sch Measuring & Opt Engn, Nanchang 330063, Peoples R China
关键词
Estimation; Iterative methods; Cameras; Task analysis; Feature extraction; Decoding; Training; Monocular depth estimation; iterative refinement; self-supervised learning; deep learning;
D O I
10.1109/TCSVT.2023.3284479
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Self-supervised monocular depth estimation has been a challenging task in computer vision for a long time, and it relies on only monocular or stereo video for its supervision. To address the challenge, we propose a novel multi-frame monocular depth estimation method called IterDepth, which is based on an iterative residual refinement network. IterDepth extracts depth features from consecutive frames and computes a 3D cost volume measuring the difference between current and previous features transformed by PoseCNN (pose estimation convolutional neural network). We reformulate depth prediction as a residual learning problem, revamping the dominating depth regression to enable high-accuracy multi-frame monocular depth estimation. Specifically, we design a gated recurrent depth fusion unit that seamlessly blends depth features from the cost volume, image features, and the depth prediction. The unit updates the hidden states and refines the depth map through iterative refinement, achieving more accurate predictions than existing methods. Our experiments on the KITTI dataset demonstrate that IterDepth is 7 x faster in terms of FPS (frames per second) than the recent state-of-the-art DepthFormer model with competitive performance. We also test IterDepth on the Cityscapes dataset to showcase its generalization capability in other real-world environments. Moreover, IterDepth can balance accuracy and computational efficiency by adjusting the number of refinement iterations and performs competitively with other CNN-based monocular depth estimation approaches. Source code is available at https://github.com/PCwenyue/IterDepth-TCSVT.
引用
收藏
页码:329 / 341
页数:13
相关论文
共 50 条
  • [1] LungDepth: Self-Supervised Multi-Frame Monocular Depth Estimation for Bronchoscopy
    Xu, Jingsheng
    Guan, Bo
    Zhao, Jianchang
    Yi, Bo
    Li, Jianmin
    INTERNATIONAL JOURNAL OF MEDICAL ROBOTICS AND COMPUTER ASSISTED SURGERY, 2025, 21 (01):
  • [2] Self-Supervised Multi-Frame Monocular Depth Estimation for Dynamic Scenes
    Wu, Guanghui
    Liu, Hao
    Wang, Longguang
    Li, Kunhong
    Guo, Yulan
    Chen, Zengping
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) : 4989 - 5001
  • [3] The Temporal Opportunist: Self-Supervised Multi-Frame Monocular Depth
    Watson, Jamie
    Mac Aodha, Oisin
    Prisacariu, Victor
    Brostow, Gabriel
    Firman, Michael
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 1164 - 1174
  • [4] Self-supervised Multi-frame Monocular Depth Estimation with Pseudo-LiDAR Pose Enhancement
    Wu, Wenhua
    Wang, Guangming
    Zhong, Jiquan
    Wang, Hesheng
    Liu, Zhe
    2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 10018 - 10025
  • [5] ProDepth: Boosting Self-supervised Multi-frame Monocular Depth with Probabilistic Fusion
    Woo, Sungmin
    Lee, Wonjoon
    Kim, Woo Jin
    Lee, Dogyoon
    Lee, Sangyoun
    COMPUTER VISION - ECCV 2024, PT III, 2025, 15061 : 201 - 217
  • [6] Self-Supervised Multi-Frame Monocular Scene Flow
    Hur, Junhwa
    Roth, Stefan
    2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, : 2683 - 2693
  • [7] Multi-Frame Self-Supervised Depth with Transformers
    Guizilini, Vitor
    Ambrus, Rares
    Chen, Dian
    Zakharov, Sergey
    Gaidon, Adrien
    2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, : 160 - 170
  • [8] Self-supervised multi-frame depth estimation with visual-inertial pose transformer and monocular guidance
    Wang, Xiang
    Luo, Haonan
    Wang, Zihang
    Zheng, Jin
    Bai, Xiao
    INFORMATION FUSION, 2024, 108
  • [9] Crafting Monocular Cues and Velocity Guidance for Self-Supervised Multi-Frame Depth Learning
    Wang, Xiaofeng
    Zhu, Zheng
    Huang, Guan
    Chi, Xu
    Ye, Yun
    Chen, Ziwei
    Wang, Xingang
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 3, 2023, : 2689 - 2697
  • [10] Mono-ViFI: A Unified Learning Framework for Self-supervised Single and Multi-frame Monocular Depth Estimation
    Liu, Jinfeng
    Kong, Lingtong
    Li, Bo
    Wang, Zerong
    Gu, Hong
    Chen, Jinwei
    COMPUTER VISION - ECCV 2024, PT XLV, 2025, 15103 : 90 - 107