Revisit Self-supervised Depth Estimation with Local Structure-from-Motion

被引:0
作者
Zhu, Shengjie [1 ]
Liu, Xiaoming [1 ]
机构
[1] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA
来源
COMPUTER VISION-ECCV 2024, PT LXXXII | 2025年 / 15140卷
关键词
Self-supervision; Depth; Pose; Structure-from-Motion;
D O I
10.1007/978-3-031-73007-8_3
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Both self-supervised depth estimation and Structure-from-Motion (SfM) recover scene depth from RGB videos. Despite sharing a similar objective, the two approaches are disconnected. Prior works of self-supervision backpropagate losses defined within immediate neighboring frames. Instead of learning-through-loss, this work proposes an alternative scheme by performing local SfM. First, with calibrated RGB or RGB-D images, we employ a depth and correspondence estimator to infer depthmaps and pair-wise correspondence maps. Then, a novel bundle-RANSAC-adjustment algorithm jointly optimizes camera poses and one depth adjustment for each depthmap. Finally, we fix camera poses and employ a NeRF, however, without a neural network, for dense triangulation and geometric verification. Poses, depth adjustments, and triangulated sparse depths are our outputs. For the first time, we show self-supervision within 5 frames already benefits SoTA supervised depth and correspondence models. Despite self-supervision, our pose algorithm has certified global optimality, outperforming optimization-based, learning-based, and NeRF-based prior arts. The project page is held in the link.
引用
收藏
页码:38 / 56
页数:19
相关论文
共 78 条
[1]   Building Rome in a Day [J].
Agarwal, Sameer ;
Furukawa, Yasutaka ;
Snavely, Noah ;
Simon, Ian ;
Curless, Brian ;
Seitz, Steven M. ;
Szeliski, Richard .
COMMUNICATIONS OF THE ACM, 2011, 54 (10) :105-112
[2]   Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields [J].
Barron, Jonathan T. ;
Mildenhall, Ben ;
Verbin, Dor ;
Srinivasan, Pratul P. ;
Hedman, Peter .
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5460-5469
[3]  
Beder C, 2006, LECT NOTES COMPUT SC, V4174, P657
[4]  
Bhat SF, 2023, Arxiv, DOI [arXiv:2302.12288, DOI 10.48550/ARXIV.2302.12288]
[5]   AdaBins: Depth Estimation Using Adaptive Bins [J].
Bhat, Shariq Farooq ;
Alhashim, Ibraheem ;
Wonka, Peter .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4008-4017
[6]  
Bian JW, 2019, ADV NEUR IN, V32
[7]   Discriminative Learning of Local Image Descriptors [J].
Brown, Matthew ;
Hua, Gang ;
Winder, Simon .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (01) :43-57
[8]  
Butime J., 2006, VISAPP
[9]  
Casser V, 2019, AAAI CONF ARTIF INTE, P8001
[10]   Self-supervised Learning with Geometric Constraints in Monocular Video Connecting Flow, Depth, and Camera [J].
Chen, Yuhua ;
Schmid, Cordelia ;
Sminchisescu, Cristian .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7062-7071