Uni-DPM: Unifying Self-Supervised Monocular Depth, Pose, and Object Motion Estimation With a Shared Representation

被引:0
作者
Wu, Guanghui [1 ]
Chen, Lili [2 ]
Chen, Zengping [1 ]
机构
[1] Sun Yat Sen Univ, Sch Elect & Commun Engn, Shenzhen Campus, Shenzhen 518107, Peoples R China
[2] Natl Innovat Inst Def Technol, Artificial Intelligence Res Ctr, Beijing 100071, Peoples R China
基金
中国国家自然科学基金;
关键词
Cameras; Computer vision; Three-dimensional displays; Optical flow; Image motion analysis; Estimation; Videos; Motion segmentation; Geometry; Depth measurement; Self-supervised learning; monocular depth estimation; odometry; scene flow estimation; motion segmentation; VISUAL ODOMETRY; SLAM;
D O I
10.1109/TMM.2024.3521846
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Self-supervised monocular depth estimation has been widely studied for 3D perception, as it can infer depth, pose, and object motion from monocular videos. However, existing single-view and multi-view methods employ separate networks to learn specific representations for these different tasks. This not only results in a cumbersome model architecture but also limits the representation capacity. In this paper, we revisit previous methods and have the following insights: (1) these three tasks are reciprocal and all depend on matching information and (2) different representations carry complementary information. Based on these insights, we propose Uni-DPM, a compact self-supervised framework to complete these three tasks with a shared representation. Specifically, we introduce an U-net-like model to synchronously complete multiple tasks by leveraging their common dependence on matching information, and iteratively refine the predictions by utilizing the reciprocity among tasks. Furthermore, we design a shared Appearance-Matching-Temporal (AMT) representation for these three tasks by exploiting the complementarity among different types of information. In addition, our Uni-DPM is scalable to downstream tasks, including scene flow, optical flow, and motion segmentation. Comparative experiments demonstrate the competitiveness of our Uni-DPM on these tasks, while ablation experiments also verify our insights.
引用
收藏
页码:1498 / 1511
页数:14
相关论文
共 100 条
[31]   Self-Supervised Monocular Scene Flow Estimation [J].
Hur, Junhwa ;
Roth, Stefan .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :7394-7403
[32]   Occlusions, Motion and Depth Boundaries with a Generic Network for Disparity, Optical Flow or Scene Flow Estimation [J].
Ilg, Eddy ;
Saikia, Tonmoy ;
Keuper, Margret ;
Brox, Thomas .
COMPUTER VISION - ECCV 2018, PT XII, 2018, 11216 :626-643
[33]   EffiScene: Efficient Per-Pixel Rigidity Inference for Unsupervised Joint Learning of Optical Flow, Depth, Camera Pose and Motion Segmentation [J].
Jiao, Yang ;
Tran, Trac D. ;
Shi, Guangming .
2021 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, CVPR 2021, 2021, :5534-5543
[34]   Self-supervised Monocular Trained Depth Estimation using Self-attention and Discrete Disparity Volume [J].
Johnston, Adrian ;
Carneiro, Gustavo .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, :4755-4764
[35]   What Matters in Unsupervised Optical Flow [J].
Jonschkowski, Rico ;
Stone, Austin ;
Barron, Jonathan T. ;
Gordon, Ariel ;
Konolige, Kurt ;
Angelova, Anelia .
COMPUTER VISION - ECCV 2020, PT II, 2020, 12347 :557-572
[36]   Fine-grained Semantics-aware Representation Enhancement for Self-supervised Monocular Depth Estimation [J].
Jung, Hyunyoung ;
Park, Eunhyeok ;
Yoo, Sungjoo .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :12622-12632
[37]  
Kingma D. P., 2014, INT C LEARNING REPRE
[38]   Self-supervised Monocular Depth Estimation: Solving the Dynamic Object Problem by Semantic Guidance [J].
Klingner, Marvin ;
Termoehlen, Jan-Aike ;
Mikolajczyk, Jonas ;
Fingscheidt, Tim .
COMPUTER VISION - ECCV 2020, PT XX, 2020, 12365 :582-600
[39]   CoMoDA: Continuous Monocular Depth Adaptation Using Past Experiences [J].
Kuznietsov, Yevhen ;
Proesmans, Marc ;
Van Gool, Luc .
2021 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION WACV 2021, 2021, :2906-2916
[40]   Attentive and Contrastive Learning for Joint Depth and Motion Field Estimation [J].
Lee, Seokju ;
Rameau, Francois ;
Pan, Fei ;
Kweon, In So .
2021 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2021), 2021, :4842-4851