Revisit Self-supervised Depth Estimation with Local Structure-from-Motion

被引：0

作者：

Zhu, Shengjie ^{[1
]}

Liu, Xiaoming ^{[1
]}

机构：

[1] Michigan State Univ, Dept Comp Sci & Engn, E Lansing, MI 48824 USA

来源：

COMPUTER VISION-ECCV 2024, PT LXXXII | 2025年 / 15140卷

关键词：

Self-supervision; Depth; Pose; Structure-from-Motion;

D O I：

10.1007/978-3-031-73007-8_3

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Both self-supervised depth estimation and Structure-from-Motion (SfM) recover scene depth from RGB videos. Despite sharing a similar objective, the two approaches are disconnected. Prior works of self-supervision backpropagate losses defined within immediate neighboring frames. Instead of learning-through-loss, this work proposes an alternative scheme by performing local SfM. First, with calibrated RGB or RGB-D images, we employ a depth and correspondence estimator to infer depthmaps and pair-wise correspondence maps. Then, a novel bundle-RANSAC-adjustment algorithm jointly optimizes camera poses and one depth adjustment for each depthmap. Finally, we fix camera poses and employ a NeRF, however, without a neural network, for dense triangulation and geometric verification. Poses, depth adjustments, and triangulated sparse depths are our outputs. For the first time, we show self-supervision within 5 frames already benefits SoTA supervised depth and correspondence models. Despite self-supervision, our pose algorithm has certified global optimality, outperforming optimization-based, learning-based, and NeRF-based prior arts. The project page is held in the link.

引用

页码：38 / 56

页数：19

共 78 条

[1] Building Rome in a Day [J].

Agarwal, Sameer ;

Furukawa, Yasutaka ;

Snavely, Noah ;

Simon, Ian ;

Curless, Brian ;

Seitz, Steven M. ;

Szeliski, Richard .

COMMUNICATIONS OF THE ACM, 2011, 54 (10) :105-112

[2] Mip-NeRF 360: Unbounded Anti-Aliased Neural Radiance Fields [J].

Barron, Jonathan T. ;

Mildenhall, Ben ;

Verbin, Dor ;

Srinivasan, Pratul P. ;

Hedman, Peter .

2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2022), 2022, :5460-5469

[3]

Beder C, 2006, LECT NOTES COMPUT SC, V4174, P657

[4]

Bhat SF, 2023, Arxiv, DOI [arXiv:2302.12288, DOI 10.48550/ARXIV.2302.12288]

[5] AdaBins: Depth Estimation Using Adaptive Bins [J].

Bhat, Shariq Farooq ;

Alhashim, Ibraheem ;

Wonka, Peter .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :4008-4017

[6]

Bian JW, 2019, ADV NEUR IN, V32

[7] Discriminative Learning of Local Image Descriptors [J].

Brown, Matthew ;

Hua, Gang ;

Winder, Simon .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (01) :43-57

[8]

Butime J., 2006, VISAPP

[9]

Casser V, 2019, AAAI CONF ARTIF INTE, P8001

[10] Self-supervised Learning with Geometric Constraints in Monocular Video Connecting Flow, Depth, and Camera [J].

Chen, Yuhua ;

Schmid, Cordelia ;

Sminchisescu, Cristian .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7062-7071

← 1 2 3 4 5 6 7 8 →