Scale-Aware Visual-Inertial Depth Estimation and Odometry Using Monocular Self-Supervised Learning

被引：4

作者：

Lee, Chungkeun ^{[1
]}

Kim, Changhyeon ^{[2
]}

Kim, Pyojin ^{[3
]}

Lee, Hyeonbeom ^{[4
]}

Kim, H. Jin ^{[5
]}

机构：

[1] Seoul Natl Univ, Inst Adv Aerosp Technol, Seoul 08826, South Korea

[2] Seoul Natl Univ, Automation & Syst Res Inst, Seoul 08826, South Korea

[3] Sookmyung Womens Univ, Dept Mech Syst Engn, Seoul 04312, South Korea

[4] Kyungpook Natl Univ, Sch Elect & Elect Engn, Daegu 37224, South Korea

[5] Seoul Natl Univ, Dept Mech & Aerosp Engn, Seoul 08826, South Korea

来源：

IEEE ACCESS | 2023年 / 11卷

基金：

新加坡国家研究基金会;

关键词：

Odometry; Deep learning; Loss measurement; Depth measurement; Cameras; Self-supervised learning; Coordinate measuring machines; monocular depth estimation; self-supervised learning; visual-inertial odometry;

D O I：

10.1109/ACCESS.2023.3252884

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

For real-world applications with a single monocular camera, scale ambiguity is an important issue. Because self-supervised data-driven approaches that do not require additional data containing scale information cannot avoid the scale ambiguity, state-of-the-art deep-learning-based methods address this issue by learning the scale information from additional sensor measurements. In that regard, inertial measurement unit (IMU) is a popular sensor for various mobile platforms due to its lightweight and inexpensiveness. However, unlike supervised learning that can learn the scale from the ground-truth information, learning the scale from IMU is challenging in a self-supervised setting. We propose a scale-aware monocular visual-inertial depth estimation and odometry method with end-to-end training. To learn the scale from the IMU measurements with end-to-end training in the monocular self-supervised setup, we propose a new loss function named as preintegration loss function, which trains scale-aware ego-motion by comparing the ego-motion integrated from IMU measurement and predicted ego-motion. Since the gravity and the bias should be compensated to obtain the ego-motion by integrating IMU measurements, we design a network to predict the gravity and the bias in addition to the ego-motion and the depth map. The overall performance of the proposed method is compared to state-of-the-art methods in the popular outdoor driving dataset, i.e., KITTI dataset, and the author-collected indoor driving dataset. In the KITTI dataset, the proposed method shows competitive performance compared with state-of-the-art monocular depth estimation and odometry methods, i.e., root-mean-square error of 5.435 m in the KITTI Eigen split and absolute trajectory error of 22.46 m and 0.2975 degrees in the KITTI odometry 09 sequence. Different from other up-to-scale monocular methods, the proposed method can estimate the metric-scaled depth and camera poses. Additional experiments on the author-collected indoor driving dataset qualitatively confirm the accurate performance of metric-depth and metric pose estimations.

引用

页码：24087 / 24102

页数：16

共 51 条

[1] Generative Adversarial Networks for Unsupervised Monocular Depth Prediction
Aleotti, Filippo
Tosi, Fabio
Poggi, Matteo
Mattoccia, Stefano
[J]. COMPUTER VISION - ECCV 2018 WORKSHOPS, PT I, 2019, 11129 : 337 - 354
[2] SelfVIO: Self-supervised deep monocular Visual-Inertial Odometry and depth estimation
Almalioglu, Yasin
Turan, Mehmet
Saputra, Muhamad Risqi U.
de Gusmao, Pedro P. B.
Markham, Andrew
Trigoni, Niki
[J]. NEURAL NETWORKS, 2022, 150 : 119 - 136
[3] Almalioglu Y, 2019, IEEE INT CONF ROBOT, P5474, DOI [10.1109/icra.2019.8793512, 10.1109/ICRA.2019.8793512]
[4] Ba J.L., 2016, arXiv
[5] Bloesch M, 2015, IEEE INT C INT ROBOT, P298, DOI 10.1109/IROS.2015.7353389
[6] ORB-SLAM3: An Accurate Open-Source Library for Visual, Visual-Inertial, and Multimap SLAM
Campos, Carlos
Elvira, Richard
Gomez Rodriguez, Juan J.
Montiel, Jose M. M.
Tardos, Juan D.
[J]. IEEE TRANSACTIONS ON ROBOTICS, 2021, 37 (06) : 1874 - 1890
[7] Chang Shu, 2020, Computer Vision - ECCV 2020. 16th European Conference. Proceedings. Lecture Notes in Computer Science (LNCS 12364), P572, DOI 10.1007/978-3-030-58529-7_34
[8] A Review of Visual-Inertial Simultaneous Localization and Mapping from Filtering-Based and Optimization-Based Perspectives
Chen, Chang
Zhu, Hua
Li, Menggang
You, Shaoze
[J]. ROBOTICS, 2018, 7 (03)
[9] Selective Sensor Fusion for Neural Visual-Inertial Odometry
Chen, Changhao
Rosa, Stefano
Miao, Yishu
Lu, Chris Xiaoxuan
Wu, Wei
Markham, Andrew
Trigoni, Niki
[J]. 2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, : 10534 - 10543
[10] SelfTune: Metrically Scaled Monocular Depth Estimation through Self-Supervised Learning
Choi, Jaehoon
Jung, Dongki
Lee, Yonghan
Kim, Deokhwa
Manocha, Dinesh
Lee, Donghwan
[J]. 2022 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2022, 2022, : 6511 - 6518

← 1 2 3 4 5 6 →