Deep Direct Visual Odometry

被引:23
作者
Zhao, Chaoqiang [1 ]
Tang, Yang [1 ]
Sun, Qiyu [1 ]
Vasilakos, Athanasios V. [2 ,3 ,4 ]
机构
[1] East China Univ Sci & Technol, Key Lab Smart Mfg Energy Chem Proc, Minist Educ, Shanghai 200237, Peoples R China
[2] Univ Technol Sydney, Sch Elect & Data Engn, Sydney, NSW, Australia
[3] Fuzhou Univ, Coll Math & Comp Sci, Fuzhou 350116, Peoples R China
[4] Lulea Univ Technol, Dept Comp Sci Elect & Space Engn, S-97187 Lulea, Sweden
基金
中国国家自然科学基金;
关键词
Visual odometry; direct methods; pose estimation; deep learning; unsupervised learning; VISION; SYSTEM; POSE;
D O I
10.1109/TITS.2021.3071886
中图分类号
TU [建筑科学];
学科分类号
0813 ;
摘要
Traditional monocular direct visual odometry (DVO) is one of the most famous methods to estimate the ego-motion of robots and map environments from images simultaneously. However, DVO heavily relies on high-quality images and accurate initial pose estimation during tracking. With the outstanding performance of deep learning, previous works have shown that deep neural networks can effectively learn 6-DoF (Degree of Freedom) poses between frames from monocular image sequences in the unsupervised manner. However, these unsupervised deep learning-based frameworks cannot accurately generate the full trajectory of a long monocular video because of the scale-inconsistency between each pose. To address this problem, we use several geometric constraints to improve the scale-consistency of the pose network, including improving the previous loss function and proposing a novel scale-to-trajectory constraint for unsupervised training. We call the pose network trained by the proposed novel constraint as TrajNet. In addition, a new DVO architecture, called deep direct sparse odometry (DDSO), is proposed to overcome the drawbacks of the previous direct sparse odometry (DSO) framework by embedding TrajNet. Extensive experiments on the KITTI dataset show that the proposed constraints can effectively improve the scale-consistency of TrajNet when compared with previous unsupervised monocular methods, and integration with TrajNet makes the initialization and tracking of DSO more robust and accurate.
引用
收藏
页码:7733 / 7742
页数:10
相关论文
共 32 条
  • [1] Bian JW, 2019, ADV NEUR IN, V32
  • [2] Casser V., 2019, P 33 AAAI C ARTIFICI, V2, P7
  • [3] Direct Sparse Odometry
    Engel, Jakob
    Koltun, Vladlen
    Cremers, Daniel
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2018, 40 (03) : 611 - 625
  • [4] LSD-SLAM: Large-Scale Direct Monocular SLAM
    Engel, Jakob
    Schoeps, Thomas
    Cremers, Daniel
    [J]. COMPUTER VISION - ECCV 2014, PT II, 2014, 8690 : 834 - 849
  • [5] SVO: Semidirect Visual Odometry for Monocular and Multicamera Systems
    Forster, Christian
    Zhang, Zichao
    Gassner, Michael
    Werlberger, Manuel
    Scaramuzza, Davide
    [J]. IEEE TRANSACTIONS ON ROBOTICS, 2017, 33 (02) : 249 - 265
  • [6] Vision meets robotics: The KITTI dataset
    Geiger, A.
    Lenz, P.
    Stiller, C.
    Urtasun, R.
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2013, 32 (11) : 1231 - 1237
  • [7] Unsupervised Monocular Depth Estimation with Left-Right Consistency
    Godard, Clement
    Mac Aodha, Oisin
    Brostow, Gabriel J.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6602 - 6611
  • [8] PoseNet: A Convolutional Network for Real-Time 6-DOF Camera Relocalization
    Kendall, Alex
    Grimes, Matthew
    Cipolla, Roberto
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2938 - 2946
  • [9] Vision-Only Localization
    Lategahn, Henning
    Stiller, Christoph
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2014, 15 (03) : 1246 - 1257
  • [10] Robust and Efficient Relative Pose With a Multi-Camera System for Autonomous Driving in Highly Dynamic Environments
    Liu, Liu
    Li, Hongdong
    Dai, Yuchao
    Pan, Quan
    [J]. IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2018, 19 (08) : 2432 - 2444