Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach

被引:1
|
作者
Francani, Andre O. [1 ]
Maximo, Marcos R. O. A. [1 ]
机构
[1] Aeronaut Inst Technol, Autonomous Computat Syst Lab, BR-12228900 Sao Jose Dos Campos, SP, Brazil
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Transformers; Visual odometry; Feature extraction; Deep learning; Computer architecture; 6-DOF; Pipelines; Odometry; Vectors; Context modeling; monocular visual odometry; transformer; video understanding;
D O I
10.1109/ACCESS.2025.3531667
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Estimating the camera's pose given images from a single camera is a traditional task in mobile robots and autonomous vehicles. This problem is called monocular visual odometry and often relies on geometric approaches that require considerable engineering effort for a specific scenario. Deep learning methods have been shown to be generalizable after proper training and with a large amount of available data. Transformer-based architectures have dominated the state-of-the-art in natural language processing and computer vision tasks, such as image and video understanding. In this work, we deal with the monocular visual odometry as a video understanding task to estimate the 6 degrees of freedom of a camera's pose. We contribute by presenting the TSformer-VO model based on spatio-temporal self-attention mechanisms to extract features from clips and estimate the motions in an end-to-end manner. Our approach achieved competitive state-of-the-art performance compared with geometry-based and deep learning-based methods on the KITTI visual odometry dataset, outperforming the DeepVO implementation highly accepted in the visual odometry community. The code is publicly available at https://github.com/aofrancani/TSformer-VO.
引用
收藏
页码:13959 / 13971
页数:13
相关论文
共 50 条
  • [1] SWformer-VO: A Monocular Visual Odometry Model Based on Swin Transformer
    Wu, Zhigang
    Zhu, Yaohui
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (05) : 4766 - 4773
  • [2] Dense Prediction Transformer for Scale Estimation in Monocular Visual Odometry
    Francani, Andre O.
    Maximo, Marcos R. O. A.
    2022 LATIN AMERICAN ROBOTICS SYMPOSIUM (LARS), 2022 BRAZILIAN SYMPOSIUM ON ROBOTICS (SBR), AND 2022 WORKSHOP ON ROBOTICS IN EDUCATION (WRE), 2022, : 312 - 317
  • [3] From Local Understanding to Global Regression in Monocular Visual Odometry
    Esfahani, Mandi Abolfazli
    Wu, Keyu
    Yuan, Shenghai
    Wang, Han
    INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 34 (01)
  • [4] Unsupervised Monocular Visual Odometry Based on Confidence Evaluation
    Liu, Yiling
    Wang, Hesheng
    Wang, Jingchuan
    Wang, Xinlei
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (06) : 5387 - 5396
  • [5] Monocular Visual Odometry Based on Hybrid Parameterization
    Mohamed, Sherif A. S.
    Haghbayan, Mohammad-Hashem
    Heikkonen, Jukka
    Tenhunen, Hannu
    Plosila, Juha
    TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019), 2020, 11433
  • [6] Transformer guided geometry model for flow-based unsupervised visual odometry
    Li, Xiangyu
    Hou, Yonghong
    Wang, Pichao
    Gao, Zhimin
    Xu, Mingliang
    Li, Wanqing
    NEURAL COMPUTING & APPLICATIONS, 2021, 33 (13) : 8031 - 8042
  • [7] Transformer guided geometry model for flow-based unsupervised visual odometry
    Xiangyu Li
    Yonghong Hou
    Pichao Wang
    Zhimin Gao
    Mingliang Xu
    Wanqing Li
    Neural Computing and Applications, 2021, 33 : 8031 - 8042
  • [8] A Novel Approach to Improve the Precision of Monocular Visual Odometry
    Xiao, Chen
    Zhu, Xiaorui
    Feng, Wei
    Ou, Yongsheng
    2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 392 - 397
  • [9] A New Approach to Train Convolutional Neural Networks for Monocular Visual Odometry
    Esfahani, Mandi Abolfazli
    Wu, Keyu
    Yuan, Shenghai
    Wang, Han
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (ICPRAI 2018), 2018, : 66 - 71
  • [10] Transformer-Based Seismic Image Enhancement: A Novel Approach for Improved Resolution
    Park, Jin-Yeong
    Saad, Omar M.
    Oh, Ju-Won
    Alkhalifah, Tariq
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63