Uni-DPM: Unifying Self-Supervised Monocular Depth, Pose, and Object Motion Estimation With a Shared Representation

被引：0

作者：

Wu, Guanghui ^{[1
]}

Chen, Lili ^{[2
]}

Chen, Zengping ^{[1
]}

机构：

[1] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen Campus, Shenzhen 518107, Peoples R China

[2] Natl Innovat Inst Def Technol, Artificial Intelligence Res Ctr, Beijing 100071, Peoples R China

来源：

IEEE TRANSACTIONS ON MULTIMEDIA | 2025年 / 27卷

基金：

中国国家自然科学基金;

关键词：

Cameras; Computer vision; Three-dimensional displays; Optical flow; Image motion analysis; Estimation; Videos; Motion segmentation; Geometry; Depth measurement; Self-supervised learning; monocular depth estimation; odometry; scene flow estimation; motion segmentation; VISUAL ODOMETRY; SLAM;

D O I：

10.1109/TMM.2024.3521846

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Self-supervised monocular depth estimation has been widely studied for 3D perception, as it can infer depth, pose, and object motion from monocular videos. However, existing single-view and multi-view methods employ separate networks to learn specific representations for these different tasks. This not only results in a cumbersome model architecture but also limits the representation capacity. In this paper, we revisit previous methods and have the following insights: (1) these three tasks are reciprocal and all depend on matching information and (2) different representations carry complementary information. Based on these insights, we propose Uni-DPM, a compact self-supervised framework to complete these three tasks with a shared representation. Specifically, we introduce an U-net-like model to synchronously complete multiple tasks by leveraging their common dependence on matching information, and iteratively refine the predictions by utilizing the reciprocity among tasks. Furthermore, we design a shared Appearance-Matching-Temporal (AMT) representation for these three tasks by exploiting the complementarity among different types of information. In addition, our Uni-DPM is scalable to downstream tasks, including scene flow, optical flow, and motion segmentation. Comparative experiments demonstrate the competitiveness of our Uni-DPM on these tasks, while ablation experiments also verify our insights.

引用

页码：1498 / 1511

页数：14

共 100 条

[1] Multi-view Scene Flow Estimation: A View Centered Variational Approach [J].

Basha, Tali ;

Moses, Yael ;

Kiryati, Nahum .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2013, 101 (01) :6-21

[2] Self-Supervised Monocular Depth Estimation With Positional Shift Depth Variance and Adaptive Disparity Quantization [J].

Bello, Juan Luis Gonzalez ;

Moon, Jaeho ;

Kim, Munchurl .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 :2074-2089

[3] Unsupervised Scale-Consistent Depth Learning from Video [J].

Bian, Jia-Wang ;

Zhan, Huangying ;

Wang, Naiyan ;

Li, Zhichao ;

Zhang, Le ;

Shen, Chunhua ;

Cheng, Ming-Ming ;

Reid, Ian .

INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (09) :2548-2564

[4] Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes [J].

Brickwedde, Fabian ;

Abraham, Steffen ;

Mester, Rudolf .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2780-2790

[5]

Casser V, 2019, AAAI CONF ARTIF INTE, P8001

[6] Self-supervised Learning with Geometric Constraints in Monocular Video Connecting Flow, Depth, and Camera [J].

Chen, Yuhua ;

Schmid, Cordelia ;

Sminchisescu, Cristian .

2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7062-7071

[7] Deep Stereo using Adaptive Thin Volume Representation with Uncertainty Awareness [J].

Cheng, Shuo ;

Xu, Zexiang ;

Zhu, Shilin ;

Li, Zhuwen ;

Li, Li Erran ;

Ramamoorthi, Ravi ;

Su, Hao .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2521-2531

[8]

Choi J, 2020, Arxiv, DOI arXiv:2010.02893

[9] The Cityscapes Dataset for Semantic Urban Scene Understanding [J].

Cordts, Marius ;

Omran, Mohamed ;

Ramos, Sebastian ;

Rehfeld, Timo ;

Enzweiler, Markus ;

Benenson, Rodrigo ;

Franke, Uwe ;

Roth, Stefan ;

Schiele, Bernt .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223

[10] Self-supervised Object Motion and Depth Estimation from Video [J].

Dai, Qi ;

Patii, Vaishakh ;

Hecker, Simon ;

Dai, Dengxin ;

Van Gool, Luc ;

Schindler, Konrad .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, :4326-4334

← 1 2 3 4 5 6 7 8 9 10 →