On Robust Cross-view Consistency in Self-supervised Monocular Depth Estimation

被引:0
作者
Haimei Zhao
Jing Zhang
Zhuo Chen
Bo Yuan
Dacheng Tao
机构
[1] University of Sydney,School of Computer Science
[2] Tsinghua University,Shenzhen International Graduate School
[3] University of Queensland,School of Information Technology & Electrical Engineering
来源
Machine Intelligence Research | 2024年 / 21卷
关键词
3D vision; depth estimation; cross-view consistency; self-supervised learning; monocular perception;
D O I
暂无
中图分类号
学科分类号
摘要
Remarkable progress has been made in self-supervised monocular depth estimation (SS-MDE) by exploring cross-view consistency, e.g., photometric consistency and 3D point cloud consistency. However, they are very vulnerable to illumination variance, occlusions, texture-less regions, as well as moving objects, making them not robust enough to deal with various scenes. To address this challenge, we study two kinds of robust cross-view consistency in this paper. Firstly, the spatial offset field between adjacent frames is obtained by reconstructing the reference frame from its neighbors via deformable alignment, which is used to align the temporal depth features via a depth feature alignment (DFA) loss. Secondly, the 3D point clouds of each reference frame and its nearby frames are calculated and transformed into voxel space, where the point density in each voxel is calculated and aligned via a voxel density alignment (VDA) loss. In this way, we exploit the temporal coherence in both depth feature space and 3D voxel space for SS-MDE, shifting the “point-to-point” alignment paradigm to the “region-to-region” one. Compared with the photometric consistency loss as well as the rigid point cloud alignment loss, the proposed DFA and VDA losses are more robust owing to the strong representation power of deep features as well as the high tolerance of voxel density to the aforementioned challenges. Experimental results on several outdoor benchmarks show that our method outperforms current state-of-the-art techniques. Extensive ablation study and analysis validate the effectiveness of the proposed losses, especially in challenging scenes. The code and models are available at https://github.com/sunnyHelen/RCVC-depth.
引用
收藏
页码:495 / 513
页数:18
相关论文
共 50 条
[31]   Geometric Constraints for Self-supervised Monocular Depth Estimation on Laparoscopic Images with Dual-task Consistency [J].
Li, Wenda ;
Hayashi, Yuichiro ;
Oda, Masahiro ;
Kitasaka, Takayuki ;
Misawa, Kazunari ;
Mori, Kensaku .
MEDICAL IMAGE COMPUTING AND COMPUTER ASSISTED INTERVENTION, MICCAI 2022, PT IV, 2022, 13434 :467-477
[32]   Self-Supervised Multi-Frame Monocular Depth Estimation for Dynamic Scenes [J].
Wu, Guanghui ;
Liu, Hao ;
Wang, Longguang ;
Li, Kunhong ;
Guo, Yulan ;
Chen, Zengping .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 2024, 34 (06) :4989-5001
[33]   RA-Depth: Resolution Adaptive Self-supervised Monocular Depth Estimation [J].
He, Mu ;
Hui, Le ;
Bian, Yikai ;
Ren, Jian ;
Xie, Jin ;
Yang, Jian .
COMPUTER VISION - ECCV 2022, PT XXVII, 2022, 13687 :565-581
[34]   Adv-Depth: Self-Supervised Monocular Depth Estimation With an Adversarial Loss [J].
Li, Kunhong ;
Fu, Zhiheng ;
Wang, Hanyun ;
Chen, Zonghao ;
Guo, Yulan .
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 :638-642
[35]   Self-Supervised Monocular Depth Estimation with Effective Feature Fusion and Self Distillation [J].
Liu, Zhenfei ;
Song, Chengqun ;
Cheng, Jun ;
Luo, Jiefu ;
Wang, Xiaoyang .
2024 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS 2024, 2024, :7160-7166
[36]   Cross-View Temporal Contrastive Learning for Self-Supervised Video Representation [J].
Wang, Lulu ;
Xu, Zengmin ;
Zhang, Xuelian ;
Meng, Ruxing ;
Lu, Tao .
Computer Engineering and Applications, 2024, 60 (18) :158-166
[37]   Self-Supervised Monocular Depth Estimation With Self-Perceptual Anomaly Handling [J].
Zhang, Yourun ;
Gong, Maoguo ;
Zhang, Mingyang ;
Li, Jianzhao .
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (12) :17292-17306
[38]   Self-distilled Feature Aggregation for Self-supervised Monocular Depth Estimation [J].
Zhou, Zhengming ;
Dong, Qiulei .
COMPUTER VISION - ECCV 2022, PT I, 2022, 13661 :709-726
[39]   TAMDepth: self-supervised monocular depth estimation with transformer and adapter modulation [J].
Li, Shaokang ;
Lyu, Chengzhi ;
Xia, Bin ;
Chen, Ziheng ;
Zhang, Lei .
VISUAL COMPUTER, 2024, 40 (10) :6797-6808
[40]   Self-supervised monocular depth estimation with occlusion mask and edge awareness [J].
Shi Zhou ;
Miaomiao Zhu ;
Zhen Li ;
He Li ;
Mitsunori Mizumachi ;
Lifeng Zhang .
Artificial Life and Robotics, 2021, 26 :354-359