Uni-DPM: Unifying Self-Supervised Monocular Depth, Pose, and Object Motion Estimation With a Shared Representation

被引:0
作者
Wu, Guanghui [1 ]
Chen, Lili [2 ]
Chen, Zengping [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen Campus, Shenzhen 518107, Peoples R China
[2] Natl Innovat Inst Def Technol, Artificial Intelligence Res Ctr, Beijing 100071, Peoples R China
基金
中国国家自然科学基金;
关键词
Cameras; Computer vision; Three-dimensional displays; Optical flow; Image motion analysis; Estimation; Videos; Motion segmentation; Geometry; Depth measurement; Self-supervised learning; monocular depth estimation; odometry; scene flow estimation; motion segmentation; VISUAL ODOMETRY; SLAM;
D O I
10.1109/TMM.2024.3521846
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Self-supervised monocular depth estimation has been widely studied for 3D perception, as it can infer depth, pose, and object motion from monocular videos. However, existing single-view and multi-view methods employ separate networks to learn specific representations for these different tasks. This not only results in a cumbersome model architecture but also limits the representation capacity. In this paper, we revisit previous methods and have the following insights: (1) these three tasks are reciprocal and all depend on matching information and (2) different representations carry complementary information. Based on these insights, we propose Uni-DPM, a compact self-supervised framework to complete these three tasks with a shared representation. Specifically, we introduce an U-net-like model to synchronously complete multiple tasks by leveraging their common dependence on matching information, and iteratively refine the predictions by utilizing the reciprocity among tasks. Furthermore, we design a shared Appearance-Matching-Temporal (AMT) representation for these three tasks by exploiting the complementarity among different types of information. In addition, our Uni-DPM is scalable to downstream tasks, including scene flow, optical flow, and motion segmentation. Comparative experiments demonstrate the competitiveness of our Uni-DPM on these tasks, while ablation experiments also verify our insights.
引用
收藏
页码:1498 / 1511
页数:14
相关论文
共 100 条
[51]  
Long J, 2015, PROC CVPR IEEE, P3431, DOI 10.1109/CVPR.2015.7298965
[52]   Two-Stream Based Multi-Stage Hybrid Decoder for Self-Supervised Multi-Frame Monocular Depth [J].
Long, Yangqi ;
Yu, Huimin ;
Liu, Biyang .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2022, 7 (04) :12291-12298
[53]   Every Pixel Counts plus plus : Joint Learning of Geometry and Motion with 3D Holistic Understanding [J].
Luo, Chenxu ;
Yang, Zhenheng ;
Wang, Peng ;
Wang, Yang ;
Xu, Wei ;
Nevatia, Ram ;
Yuille, Alan .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (10) :2624-2641
[54]   Real-Time Dense Monocular SLAM With Online Adapted Depth Prediction Network [J].
Luo, Hongcheng ;
Gao, Yang ;
Wu, Yuhao ;
Liao, Chunyuan ;
Yang, Xin ;
Cheng, Kwang-Ting .
IEEE TRANSACTIONS ON MULTIMEDIA, 2019, 21 (02) :470-483
[55]   Consistent Video Depth Estimation [J].
Luo, Xuan ;
Huang, Jia-Bin ;
Szeliski, Richard ;
Matzen, Kevin ;
Kopf, Johannes .
ACM TRANSACTIONS ON GRAPHICS, 2020, 39 (04)
[56]   Unsupervised Learning of Depth and Ego-Motion from Monocular Video Using 3D Geometric Constraints [J].
Mahjourian, Reza ;
Wicke, Martin ;
Angelova, Anelia .
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, :5667-5675
[57]   A Large Dataset to Train Convolutional Networks for Disparity, Optical Flow, and Scene Flow Estimation [J].
Mayer, Nikolaus ;
Ilg, Eddy ;
Hausser, Philip ;
Fischer, Philipp ;
Cremers, Daniel ;
Dosovitskiy, Alexey ;
Brox, Thomas .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :4040-4048
[58]  
Menze M, 2015, PROC CVPR IEEE, P3061, DOI 10.1109/CVPR.2015.7298925
[59]   ORB-SLAM2: An Open-Source SLAM System for Monocular, Stereo, and RGB-D Cameras [J].
Mur-Artal, Raul ;
Tardos, Juan D. .
IEEE TRANSACTIONS ON ROBOTICS, 2017, 33 (05) :1255-1262
[60]   Don't Forget The Past: Recurrent Depth Estimation from Monocular Video [J].
Patil, Vaishakh ;
Van Gansbeke, Wouter ;
Dai, Dengxin ;
Van Gool, Luc .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (04) :6813-6820