3D Hierarchical Refinement and Augmentation for Unsupervised Learning of Depth and Pose From Monocular Video

被引：16

作者：

Wang, Guangming ^{[1
]}

Zhong, Jiquan ^{[1
]}

Zhao, Shijie ^{[2
]}

Wu, Wenhua ^{[1
]}

Liu, Zhe ^{[3
]}

Wang, Hesheng ^{[1
]}

机构：

[1] Shanghai Jiao Tong Univ, Shanghai Engn Res Ctr Intelligent Control & Manage, Key Lab Syst Control & Informat Proc, Key Lab Marine Intelligent Equipment,Dept Automat,, Shanghai 200240, Peoples R China

[2] Shanghai Jiao Tong Univ, Dept Engn Mech, Shanghai 200240, Peoples R China

[3] Shanghai Jiao Tong Univ, AI Inst, MOE Key Lab Artificial Intelligence, Shanghai 200240, Peoples R China

来源：

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY | 2023年 / 33卷 / 04期

关键词：

Monocular depth estimation; visual odometry; unsupervised learning; pose refinement; 3D augmentation; VIEW SYNTHESIS; REMOVAL;

D O I：

10.1109/TCSVT.2022.3215587

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Depth and ego-motion estimations are essential for the localization and navigation of autonomous robots and autonomous driving. Recent studies make it possible to learn the per-pixel depth and ego-motion from the unlabeled monocular video. In this paper, a novel unsupervised training framework is proposed with 3D hierarchical refinement and augmentation using explicit 3D geometry. In this framework, the depth and pose estimations are hierarchically and mutually coupled to refine the estimated pose layer by layer. The intermediate view image is proposed and synthesized by warping the pixels in an image with the estimated depth and coarse pose. Then, the residual pose transformation can be estimated from the new view image and the image of the adjacent frame to refine the coarse pose. The iterative refinement is implemented in a differentiable manner in this paper, making the whole framework optimized uniformly. Meanwhile, a new image augmentation method is proposed for the pose estimation by synthesizing a new view image, which creatively augments the pose in 3D space but gets a new augmented 2D image. The experiments on dKITTI demonstrate that our depth estimation achieves state-of-the-art performance and even surpasses recent approaches that utilize other auxiliary tasks. Our visual odometry outperforms all recent unsupervised monocular learning-based methods and achieves competitive performance to the geometry-based method, ORB-SLAM2 with back-end optimization. The source codes will be released soon at: https://github.com/IRMVLab/HRANet.

引用

页码：1776 / 1786

页数：11

共 50 条

[21] LEARNING MONOCULAR 3D HUMAN POSE ESTIMATION WITH SKELETAL INTERPOLATION
Chen, Ziyi
Sugimoto, Akihiro
Lai, Shang-Hong
2022 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2022, : 4218 - 4222
[22] Unsupervised 3D Human Pose Estimation in Multi-view-multi-pose Video
Sun, Cheng
Thomas, Diego
Kawasaki, Hiroshi
2020 25TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION (ICPR), 2021, : 5959 - 5964
[23] Recovering 3D human pose from monocular images
Agarwal, A
Triggs, B
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2006, 28 (01) : 44 - 58
[24] Unsupervised Learning of Depth and Pose Based on Monocular Camera and Inertial Measurement Unit (IMU)
Wang, Yanbo
Yang, Hanwen
Cai, Jianwei
Wang, Guangming
Wang, Jingchuan
Huang, Yi
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023), 2023, : 10010 - 10017
[25] Unsupervised Scale-consistent Depth and Ego-motion Learning from Monocular Video
Bian, Jia-Wang
Li, Zhichao
Wang, Naiyan
Zhan, Huangying
Shen, Chunhua
Cheng, Ming-Ming
Reid, Ian
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[26] Real-Time 3D Pose Reconstruction of Human Body from Monocular Video Sequences
Zhu, LiangJia
Hwang, Jenq-Neng
Chen, Chih-Chang
Lin, Ming-Hui
Yen, Chen-Lan
ISCAS: 2009 IEEE INTERNATIONAL SYMPOSIUM ON CIRCUITS AND SYSTEMS, VOLS 1-5, 2009, : 717 - +
[27] APAC-Net: Unsupervised Learning of Depth and Ego-Motion from Monocular Video
Lin, Rui
Lu, Yao
Lu, Guangming
INTELLIGENCE SCIENCE AND BIG DATA ENGINEERING: VISUAL DATA ENGINEERING, PT I, 2019, 11935 : 336 - 348
[28] Dual Networks Based 3D Multi-Person Pose Estimation From Monocular Video
Cheng, Yu
Wang, Bo
Tan, Robby T. T.
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2023, 45 (02) : 1636 - 1651
[29] eGAC3D: enhancing depth adaptive convolution and depth estimation for monocular 3D object pose detection
Ngo, Duc Tuan
Bui, Minh-Quan Viet
Nguyen, Duc Dung
Pham, Hoang-Anh
PEERJ COMPUTER SCIENCE, 2022, 8
[30] Trajectory Optimization for Physics-Based Reconstruction of 3d Human Pose from Monocular Video
Gartner, Erik
Andriluka, Mykhaylo
Xu, Hongyi
Sminchisescu, Cristian
2022 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2022, : 13096 - 13105

← 1 2 3 4 5 →