Scale-Aware Visual-Inertial Depth Estimation and Odometry Using Monocular Self-Supervised Learning

被引:4
作者
Lee, Chungkeun [1 ]
Kim, Changhyeon [2 ]
Kim, Pyojin [3 ]
Lee, Hyeonbeom [4 ]
Kim, H. Jin [5 ]
机构
[1] Seoul Natl Univ, Inst Adv Aerosp Technol, Seoul 08826, South Korea
[2] Seoul Natl Univ, Automation & Syst Res Inst, Seoul 08826, South Korea
[3] Sookmyung Womens Univ, Dept Mech Syst Engn, Seoul 04312, South Korea
[4] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 37224, South Korea
[5] Seoul Natl Univ, Dept Mech & Aerosp Engn, Seoul 08826, South Korea
基金
新加坡国家研究基金会;
关键词
Odometry; Deep learning; Loss measurement; Depth measurement; Cameras; Self-supervised learning; Coordinate measuring machines; monocular depth estimation; self-supervised learning; visual-inertial odometry;
D O I
10.1109/ACCESS.2023.3252884
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
For real-world applications with a single monocular camera, scale ambiguity is an important issue. Because self-supervised data-driven approaches that do not require additional data containing scale information cannot avoid the scale ambiguity, state-of-the-art deep-learning-based methods address this issue by learning the scale information from additional sensor measurements. In that regard, inertial measurement unit (IMU) is a popular sensor for various mobile platforms due to its lightweight and inexpensiveness. However, unlike supervised learning that can learn the scale from the ground-truth information, learning the scale from IMU is challenging in a self-supervised setting. We propose a scale-aware monocular visual-inertial depth estimation and odometry method with end-to-end training. To learn the scale from the IMU measurements with end-to-end training in the monocular self-supervised setup, we propose a new loss function named as preintegration loss function, which trains scale-aware ego-motion by comparing the ego-motion integrated from IMU measurement and predicted ego-motion. Since the gravity and the bias should be compensated to obtain the ego-motion by integrating IMU measurements, we design a network to predict the gravity and the bias in addition to the ego-motion and the depth map. The overall performance of the proposed method is compared to state-of-the-art methods in the popular outdoor driving dataset, i.e., KITTI dataset, and the author-collected indoor driving dataset. In the KITTI dataset, the proposed method shows competitive performance compared with state-of-the-art monocular depth estimation and odometry methods, i.e., root-mean-square error of 5.435 m in the KITTI Eigen split and absolute trajectory error of 22.46 m and 0.2975 degrees in the KITTI odometry 09 sequence. Different from other up-to-scale monocular methods, the proposed method can estimate the metric-scaled depth and camera poses. Additional experiments on the author-collected indoor driving dataset qualitatively confirm the accurate performance of metric-depth and metric pose estimations.
引用
收藏
页码:24087 / 24102
页数:16
相关论文
共 51 条
  • [41] Unsupervised Deep Visual-Inertial Odometry with Online Error Correction for RGB-D Imagery
    Shamwell, E. Jared
    Lindgren, Kyle
    Leung, Sarah
    Nothwang, William D.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2020, 42 (10) : 2478 - 2493
  • [42] SDC-Depth: Semantic Divide-and-Conquer Network for Monocular Depth Estimation
    Wang, Lijun
    Zhang, Jianming
    Wang, Oliver
    Lin, Zhe
    Lu, Huchuan
    [J]. 2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 538 - 547
  • [43] Recurrent Neural Network for (Un-)supervised Learning of Monocular Video Visual Odometry and Depth
    Wang, Rui
    Pizer, Stephen M.
    Frahm, Jan-Michael
    [J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 5647 - 5656
  • [44] Wang Y., 2021, ICCV, P12727
  • [45] Wei P, 2020, PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P2347
  • [46] Unsupervised Depth Completion From Visual Inertial Odometry
    Wong, Alex
    Fei, Xiaohan
    Tsuei, Stephanie
    Soatto, Stefano
    [J]. IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (02) : 1899 - 1906
  • [47] GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose
    Yin, Zhichao
    Shi, Jianping
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 1983 - 1992
  • [48] Loss Functions for Image Restoration With Neural Networks
    Zhao, Hang
    Gallo, Orazio
    Frosio, Iuri
    Kautz, Jan
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL IMAGING, 2017, 3 (01) : 47 - 57
  • [49] Unsupervised High-Resolution Depth Learning From Videos With Dual Networks
    Zhou, Junsheng
    Wang, Yuwang
    Qin, Kaihuai
    Zeng, Wenjun
    [J]. 2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019), 2019, : 6871 - 6880
  • [50] Unsupervised Learning of Depth and Ego-Motion from Video
    Zhou, Tinghui
    Brown, Matthew
    Snavely, Noah
    Lowe, David G.
    [J]. 30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, : 6612 - +