3D Hierarchical Refinement and Augmentation for Unsupervised Learning of Depth and Pose From Monocular Video

被引:16
|
作者
Wang, Guangming [1 ]
Zhong, Jiquan [1 ]
Zhao, Shijie [2 ]
Wu, Wenhua [1 ]
Liu, Zhe [3 ]
Wang, Hesheng [1 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai Engn Res Ctr Intelligent Control & Manage, Key Lab Syst Control & Informat Proc, Key Lab Marine Intelligent Equipment,Dept Automat,, Shanghai 200240, Peoples R China
[2] Shanghai Jiao Tong Univ, Dept Engn Mech, Shanghai 200240, Peoples R China
[3] Shanghai Jiao Tong Univ, AI Inst, MOE Key Lab Artificial Intelligence, Shanghai 200240, Peoples R China
关键词
Monocular depth estimation; visual odometry; unsupervised learning; pose refinement; 3D augmentation; VIEW SYNTHESIS; REMOVAL;
D O I
10.1109/TCSVT.2022.3215587
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Depth and ego-motion estimations are essential for the localization and navigation of autonomous robots and autonomous driving. Recent studies make it possible to learn the per-pixel depth and ego-motion from the unlabeled monocular video. In this paper, a novel unsupervised training framework is proposed with 3D hierarchical refinement and augmentation using explicit 3D geometry. In this framework, the depth and pose estimations are hierarchically and mutually coupled to refine the estimated pose layer by layer. The intermediate view image is proposed and synthesized by warping the pixels in an image with the estimated depth and coarse pose. Then, the residual pose transformation can be estimated from the new view image and the image of the adjacent frame to refine the coarse pose. The iterative refinement is implemented in a differentiable manner in this paper, making the whole framework optimized uniformly. Meanwhile, a new image augmentation method is proposed for the pose estimation by synthesizing a new view image, which creatively augments the pose in 3D space but gets a new augmented 2D image. The experiments on dKITTI demonstrate that our depth estimation achieves state-of-the-art performance and even surpasses recent approaches that utilize other auxiliary tasks. Our visual odometry outperforms all recent unsupervised monocular learning-based methods and achieves competitive performance to the geometry-based method, ORB-SLAM2 with back-end optimization. The source codes will be released soon at: https://github.com/IRMVLab/HRANet.
引用
收藏
页码:1776 / 1786
页数:11
相关论文
共 50 条
  • [41] A Review of 3D Pose Estimation from a Monocular Image Sequence
    Shan Gan-lin
    Ji Bing
    Zhou Yun-feng
    PROCEEDINGS OF THE 2009 2ND INTERNATIONAL CONGRESS ON IMAGE AND SIGNAL PROCESSING, VOLS 1-9, 2009, : 1946 - 1950
  • [42] Depth-based 3D human pose refinement: Evaluating the refinet framework
    D'Eusanio, Andrea
    Simoni, Alessandro
    Pini, Stefano
    Borghi, Guido
    Vezzani, Roberto
    Cucchiara, Rita
    PATTERN RECOGNITION LETTERS, 2023, 171 : 185 - 191
  • [43] 3D Face pose estimation and tracking from a monocular camera
    Ji, Q
    IMAGE AND VISION COMPUTING, 2002, 20 (07) : 499 - 511
  • [44] Adapted human pose: monocular 3D human pose estimation with zero real 3D pose data
    Shuangjun Liu
    Naveen Sehgal
    Sarah Ostadabbas
    Applied Intelligence, 2022, 52 : 14491 - 14506
  • [45] Limb Pose Aware Networks for Monocular 3D Pose Estimation
    Wu, Lele
    Yu, Zhenbo
    Liu, Yijiang
    Liu, Qingshan
    IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 : 906 - 917
  • [46] Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras
    Gordon, Ariel
    Li, Hanhan
    Jonschkowski, Rico
    Angelova, Anelia
    2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 8976 - 8985
  • [47] Monocular 3D Object Detection with Depth from Motion
    Wang, Tai
    Pang, Jiangmiao
    Lin, Dahua
    COMPUTER VISION, ECCV 2022, PT IX, 2022, 13669 : 386 - 403
  • [48] Revisiting Depth-guided Methods for Monocular 3D Object Detection by Hierarchical Balanced Depth
    Chen, Yi-Rong
    Tseng, Ching-Yu
    Liou, Yi-Syuan
    Wu, Tsung-Han
    Hsu, Winston H.
    CONFERENCE ON ROBOT LEARNING, VOL 229, 2023, 229
  • [49] Unsupervised Hierarchical Iterative Tile Refinement Network With 3D Planar Segmentation Loss
    Yang, Ruizhi
    Li, Xingqiang
    Cong, Rigang
    Du, Jinsong
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (03) : 2678 - 2685
  • [50] A survey on monocular 3D human pose estimation
    Ji X.
    Fang Q.
    Dong J.
    Shuai Q.
    Jiang W.
    Zhou X.
    Virtual Reality and Intelligent Hardware, 2020, 2 (06): : 471 - 500