Scale-Aware Visual-Inertial Depth Estimation and Odometry Using Monocular Self-Supervised Learning

被引:4
作者
Lee, Chungkeun [1 ]
Kim, Changhyeon [2 ]
Kim, Pyojin [3 ]
Lee, Hyeonbeom [4 ]
Kim, H. Jin [5 ]
机构
[1] Seoul Natl Univ, Inst Adv Aerosp Technol, Seoul 08826, South Korea
[2] Seoul Natl Univ, Automation & Syst Res Inst, Seoul 08826, South Korea
[3] Sookmyung Womens Univ, Dept Mech Syst Engn, Seoul 04312, South Korea
[4] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 37224, South Korea
[5] Seoul Natl Univ, Dept Mech & Aerosp Engn, Seoul 08826, South Korea
基金
新加坡国家研究基金会;
关键词
Odometry; Deep learning; Loss measurement; Depth measurement; Cameras; Self-supervised learning; Coordinate measuring machines; monocular depth estimation; self-supervised learning; visual-inertial odometry;
D O I
10.1109/ACCESS.2023.3252884
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For real-world applications with a single monocular camera, scale ambiguity is an important issue. Because self-supervised data-driven approaches that do not require additional data containing scale information cannot avoid the scale ambiguity, state-of-the-art deep-learning-based methods address this issue by learning the scale information from additional sensor measurements. In that regard, inertial measurement unit (IMU) is a popular sensor for various mobile platforms due to its lightweight and inexpensiveness. However, unlike supervised learning that can learn the scale from the ground-truth information, learning the scale from IMU is challenging in a self-supervised setting. We propose a scale-aware monocular visual-inertial depth estimation and odometry method with end-to-end training. To learn the scale from the IMU measurements with end-to-end training in the monocular self-supervised setup, we propose a new loss function named as preintegration loss function, which trains scale-aware ego-motion by comparing the ego-motion integrated from IMU measurement and predicted ego-motion. Since the gravity and the bias should be compensated to obtain the ego-motion by integrating IMU measurements, we design a network to predict the gravity and the bias in addition to the ego-motion and the depth map. The overall performance of the proposed method is compared to state-of-the-art methods in the popular outdoor driving dataset, i.e., KITTI dataset, and the author-collected indoor driving dataset. In the KITTI dataset, the proposed method shows competitive performance compared with state-of-the-art monocular depth estimation and odometry methods, i.e., root-mean-square error of 5.435 m in the KITTI Eigen split and absolute trajectory error of 22.46 m and 0.2975 degrees in the KITTI odometry 09 sequence. Different from other up-to-scale monocular methods, the proposed method can estimate the metric-scaled depth and camera poses. Additional experiments on the author-collected indoor driving dataset qualitatively confirm the accurate performance of metric-depth and metric pose estimations.
引用
收藏
页码:24087 / 24102
页数:16
相关论文
共 51 条
  • [1] Generative Adversarial Networks for Unsupervised Monocular Depth Prediction
    Aleotti, Filippo
    Tosi, Fabio
    Poggi, Matteo
    Mattoccia, Stefano
    [J]. COMPUTER VISION - ECCV 2018 WORKSHOPS, PT I, 2019, 11129 : 337 - 354
  • [2] SelfVIO: Self-supervised deep monocular Visual-Inertial Odometry and depth estimation
    Almalioglu, Yasin
    Turan, Mehmet
    Saputra, Muhamad Risqi U.
    de Gusmao, Pedro P. B.
    Markham, Andrew
    Trigoni, Niki
    [J]. NEURAL NETWORKS, 2022, 150 : 119 - 136
  • [3] Almalioglu Y, 2019, IEEE INT CONF ROBOT, P5474, DOI [10.1109/icra.2019.8793512, 10.1109/ICRA.2019.8793512]
  • [4] Ba J.L., 2016, arXiv
  • [5] Bloesch M, 2015, IEEE INT C INT ROBOT, P298, DOI 10.1109/IROS.2015.7353389
  • [6] ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial, and Multimap SLAM
    Campos, Carlos
    Elvira, Richard
    Gomez Rodriguez, Juan J.
    Montiel, Jose M. M.
    Tardos, Juan D.
    [J]. IEEE TRANSACTIONS ON ROBOTICS, 2021, 37 (06) : 1874 - 1890
  • [7] Chang Shu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12364), P572, DOI 10.1007/978-3-030-58529-7_34
  • [8] A Review of Visual-Inertial Simultaneous Localization and Mapping from Filtering-Based and Optimization-Based Perspectives
    Chen, Chang
    Zhu, Hua
    Li, Menggang
    You, Shaoze
    [J]. ROBOTICS, 2018, 7 (03)
  • [9] Selective Sensor Fusion for Neural Visual-Inertial Odometry
    Chen, Changhao
    Rosa, Stefano
    Miao, Yishu
    Lu, Chris Xiaoxuan
    Wu, Wei
    Markham, Andrew
    Trigoni, Niki
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10534 - 10543
  • [10] SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning
    Choi, Jaehoon
    Jung, Dongki
    Lee, Yonghan
    Kim, Deokhwa
    Manocha, Dinesh
    Lee, Donghwan
    [J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 6511 - 6518