Uni-DPM: Unifying Self-Supervised Monocular Depth, Pose, and Object Motion Estimation With a Shared Representation

被引：0

作者：

Wu, Guanghui ^{[1
]}

Chen, Lili ^{[2
]}

Chen, Zengping ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen Campus, Shenzhen 518107, Peoples R China

[2] Natl Innovat Inst Def Technol, Artificial Intelligence Res Ctr, Beijing 100071, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2025年 / 27卷

基金：

中国国家自然科学基金;

关键词：

Cameras; Computer vision; Three-dimensional displays; Optical flow; Image motion analysis; Estimation; Videos; Motion segmentation; Geometry; Depth measurement; Self-supervised learning; monocular depth estimation; odometry; scene flow estimation; motion segmentation; VISUAL ODOMETRY; SLAM;

D O I：

10.1109/TMM.2024.3521846

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Self-supervised monocular depth estimation has been widely studied for 3D perception, as it can infer depth, pose, and object motion from monocular videos. However, existing single-view and multi-view methods employ separate networks to learn specific representations for these different tasks. This not only results in a cumbersome model architecture but also limits the representation capacity. In this paper, we revisit previous methods and have the following insights: (1) these three tasks are reciprocal and all depend on matching information and (2) different representations carry complementary information. Based on these insights, we propose Uni-DPM, a compact self-supervised framework to complete these three tasks with a shared representation. Specifically, we introduce an U-net-like model to synchronously complete multiple tasks by leveraging their common dependence on matching information, and iteratively refine the predictions by utilizing the reciprocity among tasks. Furthermore, we design a shared Appearance-Matching-Temporal (AMT) representation for these three tasks by exploiting the complementarity among different types of information. In addition, our Uni-DPM is scalable to downstream tasks, including scene flow, optical flow, and motion segmentation. Comparative experiments demonstrate the competitiveness of our Uni-DPM on these tasks, while ablation experiments also verify our insights.

引用

页码：1498 / 1511

页数：14

共 100 条

[31] Self-Supervised Monocular Scene Flow Estimation [J].

Hur, Junhwa ;

Roth, Stefan .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :7394-7403

[32] Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation [J].

Ilg, Eddy ;

Saikia, Tonmoy ;

Keuper, Margret ;

Brox, Thomas .

COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 :626-643

[33] EffiScene: Efficient Per-Pixel Rigidity Inference for Unsupervised Joint Learning of Optical Flow, Depth, Camera Pose and Motion Segmentation [J].

Jiao, Yang ;

Tran, Trac D. ;

Shi, Guangming .

2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5534-5543

[34] Self-supervised Monocular Trained Depth Estimation using Self-attention and Discrete Disparity Volume [J].

Johnston, Adrian ;

Carneiro, Gustavo .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4755-4764

[35] What Matters in Unsupervised Optical Flow [J].

Jonschkowski, Rico ;

Stone, Austin ;

Barron, Jonathan T. ;

Gordon, Ariel ;

Konolige, Kurt ;

Angelova, Anelia .

COMPUTER VISION - ECCV 2020, PT II, 2020, 12347 :557-572

[36] Fine-grained Semantics-aware Representation Enhancement for Self-supervised Monocular Depth Estimation [J].

Jung, Hyunyoung ;

Park, Eunhyeok ;

Yoo, Sungjoo .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :12622-12632

[37]

Kingma D. P., 2014, INT C LEARNING REPRE

[38] Self-supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance [J].

Klingner, Marvin ;

Termoehlen, Jan-Aike ;

Mikolajczyk, Jonas ;

Fingscheidt, Tim .

COMPUTER VISION - ECCV 2020, PT XX, 2020, 12365 :582-600

[39] CoMoDA: Continuous Monocular Depth Adaptation Using Past Experiences [J].

Kuznietsov, Yevhen ;

Proesmans, Marc ;

Van Gool, Luc .

2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :2906-2916

[40] Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation [J].

Lee, Seokju ;

Rameau, Francois ;

Pan, Fei ;

Kweon, In So .

2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :4842-4851

← 1 2 3 4 5 6 7 8 9 10 →