SelfVIO: Self-supervised deep monocular Visual-Inertial Odometry and depth estimation

被引:45
作者
Almalioglu, Yasin [1 ]
Turan, Mehmet [2 ]
Saputra, Muhamad Risqi U. [1 ]
de Gusmao, Pedro P. B. [1 ]
Markham, Andrew [1 ]
Trigoni, Niki [1 ]
机构
[1] Univ Oxford, Comp Sci Dept, Oxford, England
[2] Bogazici Univ, Inst Biomed Engn, Istanbul, Turkey
基金
英国工程与自然科学研究理事会;
关键词
Self-supervised learning; Geometry reconstruction; Machine perception; Generative adversarial networks; Deep sensor fusion; visual-inertial odometry; OPTICAL-FLOW; EGO-MOTION; NETWORK; VISION; SLAM; CALIBRATION; PREDICTION; VERSATILE; FUSION; ROBUST;
D O I
10.1016/j.neunet.2022.03.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the last decade, numerous supervised deep learning approaches have been proposed for visual- inertial odometry (VIO) and depth map estimation, which require large amounts of labelled data. To overcome the data limitation, self-supervised learning has emerged as a promising alternative that exploits constraints such as geometric and photometric consistency in the scene. In this study, we present a novel self-supervised deep learning-based VIO and depth map recovery approach (SelfVIO) using adversarial training and self-adaptive visual-inertial sensor fusion. SelfVIO learns the joint estimation of 6 degrees-of-freedom (6-DoF) ego-motion and a depth map of the scene from unlabelled monocular RGB image sequences and inertial measurement unit (IMU) readings. The proposed approach is able to perform VIO without requiring IMU intrinsic parameters and/or extrinsic calibration between IMU and the camera. We provide comprehensive quantitative and qualitative evaluations of the proposed framework and compare its performance with state-of-the-art VIO, VO, and visual simultaneous localization and mapping (VSLAM) approaches on the KITTI, EuRoC and Cityscapes datasets. Detailed comparisons prove that SelfVIO outperforms state-of-the-art VIO approaches in terms of pose estimation and depth recovery, making it a promising approach among existing methods in the literature.(c) 2022 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
引用
收藏
页码:119 / 136
页数:18
相关论文
共 107 条
[1]  
Abadi M, 2016, PROCEEDINGS OF OSDI'16: 12TH USENIX SYMPOSIUM ON OPERATING SYSTEMS DESIGN AND IMPLEMENTATION, P265
[2]   Generative Adversarial Networks for Unsupervised Monocular Depth Prediction [J].
Aleotti, Filippo ;
Tosi, Fabio ;
Poggi, Matteo ;
Mattoccia, Stefano .
COMPUTER VISION - ECCV 2018 WORKSHOPS, PT I, 2019, 11129 :337-354
[3]   Unsupervised Deep Persistent Monocular Visual Odometry and Depth Estimation in Extreme Environments [J].
Almalioglu, Yasin ;
Santamaria-Navarro, Angel ;
Morrell, Benjamin ;
Agha-mohammadi, Ali-akbar .
2021 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2021, :3534-3541
[4]   Milli-RIO: Ego-Motion Estimation With Low-Cost Millimetre-Wave Radar [J].
Almalioglu, Yasin ;
Turan, Mehmet ;
Lu, Chris Xiaoxuan ;
Trigoni, Niki ;
Markham, Andrew .
IEEE SENSORS JOURNAL, 2021, 21 (03) :3314-3323
[5]   EndoL2H: Deep Super-Resolution for Capsule Endoscopy [J].
Almalioglu, Yasin ;
Bengisu Ozyoruk, Kutsev ;
Gokce, Abdulkadir ;
Incetan, Kagan ;
Irem Gokceler, Guliz ;
Ali Simsek, Muhammed ;
Ararat, Kivanc ;
Chen, Richard J. ;
Durr, Nicholas J. ;
Mahmood, Faisal ;
Turan, Mehmet .
IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (12) :4297-4309
[6]  
Almalioglu Y, 2019, IEEE INT CONF ROBOT, P5474, DOI [10.1109/icra.2019.8793512, 10.1109/ICRA.2019.8793512]
[7]  
Artetxe M, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P3632
[8]   Multimodal vehicle detection: fusing 3D-LIDAR and color camera data [J].
Asvadi, Alireza ;
Garrote, Luis ;
Premebida, Cristiano ;
Peixoto, Paulo ;
Nunes, Urbano J. .
PATTERN RECOGNITION LETTERS, 2018, 115 :20-29
[9]   Iterated extended Kalman filter based visual-inertial odometry using direct photometric feedback [J].
Bloesch, Michael ;
Burri, Michael ;
Omari, Sammy ;
Hutter, Marco ;
Siegwart, Roland .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2017, 36 (10) :1053-1072
[10]   Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks [J].
Bousmalis, Konstantinos ;
Silberman, Nathan ;
Dohan, David ;
Erhan, Dumitru ;
Krishnan, Dilip .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :95-104