Uni-DPM: Unifying Self-Supervised Monocular Depth, Pose, and Object Motion Estimation With a Shared Representation

被引:0
作者
Wu, Guanghui [1 ]
Chen, Lili [2 ]
Chen, Zengping [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen Campus, Shenzhen 518107, Peoples R China
[2] Natl Innovat Inst Def Technol, Artificial Intelligence Res Ctr, Beijing 100071, Peoples R China
基金
中国国家自然科学基金;
关键词
Cameras; Computer vision; Three-dimensional displays; Optical flow; Image motion analysis; Estimation; Videos; Motion segmentation; Geometry; Depth measurement; Self-supervised learning; monocular depth estimation; odometry; scene flow estimation; motion segmentation; VISUAL ODOMETRY; SLAM;
D O I
10.1109/TMM.2024.3521846
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Self-supervised monocular depth estimation has been widely studied for 3D perception, as it can infer depth, pose, and object motion from monocular videos. However, existing single-view and multi-view methods employ separate networks to learn specific representations for these different tasks. This not only results in a cumbersome model architecture but also limits the representation capacity. In this paper, we revisit previous methods and have the following insights: (1) these three tasks are reciprocal and all depend on matching information and (2) different representations carry complementary information. Based on these insights, we propose Uni-DPM, a compact self-supervised framework to complete these three tasks with a shared representation. Specifically, we introduce an U-net-like model to synchronously complete multiple tasks by leveraging their common dependence on matching information, and iteratively refine the predictions by utilizing the reciprocity among tasks. Furthermore, we design a shared Appearance-Matching-Temporal (AMT) representation for these three tasks by exploiting the complementarity among different types of information. In addition, our Uni-DPM is scalable to downstream tasks, including scene flow, optical flow, and motion segmentation. Comparative experiments demonstrate the competitiveness of our Uni-DPM on these tasks, while ablation experiments also verify our insights.
引用
收藏
页码:1498 / 1511
页数:14
相关论文
共 100 条
[1]   Multi-view Scene Flow Estimation: A View Centered Variational Approach [J].
Basha, Tali ;
Moses, Yael ;
Kiryati, Nahum .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2013, 101 (01) :6-21
[2]   Self-Supervised Monocular Depth Estimation With Positional Shift Depth Variance and Adaptive Disparity Quantization [J].
Bello, Juan Luis Gonzalez ;
Moon, Jaeho ;
Kim, Munchurl .
IEEE TRANSACTIONS ON IMAGE PROCESSING, 2024, 33 :2074-2089
[3]   Unsupervised Scale-Consistent Depth Learning from Video [J].
Bian, Jia-Wang ;
Zhan, Huangying ;
Wang, Naiyan ;
Li, Zhichao ;
Zhang, Le ;
Shen, Chunhua ;
Cheng, Ming-Ming ;
Reid, Ian .
INTERNATIONAL JOURNAL OF COMPUTER VISION, 2021, 129 (09) :2548-2564
[4]   Mono-SF: Multi-View Geometry Meets Single-View Depth for Monocular Scene Flow Estimation of Dynamic Traffic Scenes [J].
Brickwedde, Fabian ;
Abraham, Steffen ;
Mester, Rudolf .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :2780-2790
[5]  
Casser V, 2019, AAAI CONF ARTIF INTE, P8001
[6]   Self-supervised Learning with Geometric Constraints in Monocular Video Connecting Flow, Depth, and Camera [J].
Chen, Yuhua ;
Schmid, Cordelia ;
Sminchisescu, Cristian .
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, :7062-7071
[7]   Deep Stereo using Adaptive Thin Volume Representation with Uncertainty Awareness [J].
Cheng, Shuo ;
Xu, Zexiang ;
Zhu, Shilin ;
Li, Zhuwen ;
Li, Li Erran ;
Ramamoorthi, Ravi ;
Su, Hao .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :2521-2531
[8]  
Choi J, 2020, Arxiv, DOI arXiv:2010.02893
[9]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[10]   Self-supervised Object Motion and Depth Estimation from Video [J].
Dai, Qi ;
Patii, Vaishakh ;
Hecker, Simon ;
Dai, Dengxin ;
Van Gool, Luc ;
Schindler, Konrad .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS (CVPRW 2020), 2020, :4326-4334