Uni-DPM: Unifying Self-Supervised Monocular Depth, Pose, and Object Motion Estimation With a Shared Representation

被引：0

作者：

Wu, Guanghui ^{[1
]}

Chen, Lili ^{[2
]}

Chen, Zengping ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen Campus, Shenzhen 518107, Peoples R China

[2] Natl Innovat Inst Def Technol, Artificial Intelligence Res Ctr, Beijing 100071, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2025年 / 27卷

基金：

中国国家自然科学基金;

关键词：

Cameras; Computer vision; Three-dimensional displays; Optical flow; Image motion analysis; Estimation; Videos; Motion segmentation; Geometry; Depth measurement; Self-supervised learning; monocular depth estimation; odometry; scene flow estimation; motion segmentation; VISUAL ODOMETRY; SLAM;

D O I：

10.1109/TMM.2024.3521846

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Self-supervised monocular depth estimation has been widely studied for 3D perception, as it can infer depth, pose, and object motion from monocular videos. However, existing single-view and multi-view methods employ separate networks to learn specific representations for these different tasks. This not only results in a cumbersome model architecture but also limits the representation capacity. In this paper, we revisit previous methods and have the following insights: (1) these three tasks are reciprocal and all depend on matching information and (2) different representations carry complementary information. Based on these insights, we propose Uni-DPM, a compact self-supervised framework to complete these three tasks with a shared representation. Specifically, we introduce an U-net-like model to synchronously complete multiple tasks by leveraging their common dependence on matching information, and iteratively refine the predictions by utilizing the reciprocity among tasks. Furthermore, we design a shared Appearance-Matching-Temporal (AMT) representation for these three tasks by exploiting the complementarity among different types of information. In addition, our Uni-DPM is scalable to downstream tasks, including scene flow, optical flow, and motion segmentation. Comparative experiments demonstrate the competitiveness of our Uni-DPM on these tasks, while ablation experiments also verify our insights.

引用

页码：1498 / 1511

页数：14

共 100 条

[51]

Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965

[52] Two-Stream Based Multi-Stage Hybrid Decoder for Self-Supervised Multi-Frame Monocular Depth [J].

Long, Yangqi ;

Yu, Huimin ;

Liu, Biyang .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) :12291-12298

[53] Every Pixel Counts plus plus : Joint Learning of Geometry and Motion with 3D Holistic Understanding [J].

Luo, Chenxu ;

Yang, Zhenheng ;

Wang, Peng ;

Wang, Yang ;

Xu, Wei ;

Nevatia, Ram ;

Yuille, Alan .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (10) :2624-2641

[54] Real-Time Dense Monocular SLAM With Online Adapted Depth Prediction Network [J].

Luo, Hongcheng ;

Gao, Yang ;

Wu, Yuhao ;

Liao, Chunyuan ;

Yang, Xin ;

Cheng, Kwang-Ting .

IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) :470-483

[55] Consistent Video Depth Estimation [J].

Luo, Xuan ;

Huang, Jia-Bin ;

Szeliski, Richard ;

Matzen, Kevin ;

Kopf, Johannes .

ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (04)

[56] Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints [J].

Mahjourian, Reza ;

Wicke, Martin ;

Angelova, Anelia .

2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5667-5675

[57] A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation [J].

Mayer, Nikolaus ;

Ilg, Eddy ;

Hausser, Philip ;

Fischer, Philipp ;

Cremers, Daniel ;

Dosovitskiy, Alexey ;

Brox, Thomas .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4040-4048

[58]

Menze M, 2015, PROC CVPR IEEE, P3061, DOI 10.1109/CVPR.2015.7298925

[59] ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras [J].

Mur-Artal, Raul ;

Tardos, Juan D. .

IEEE TRANSACTIONS ON ROBOTICS, 2017, 33 (05) :1255-1262

[60] Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [J].

Patil, Vaishakh ;

Van Gansbeke, Wouter ;

Dai, Dengxin ;

Van Gool, Luc .

IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04) :6813-6820

← 1 2 3 4 5 6 7 8 9 10 →