Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach

被引:1
|
作者
Francani, Andre O. [1 ]
Maximo, Marcos R. O. A. [1 ]
机构
[1] Aeronaut Inst Technol, Autonomous Computat Syst Lab, BR-12228900 Sao Jose Dos Campos, SP, Brazil
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Transformers; Visual odometry; Feature extraction; Deep learning; Computer architecture; 6-DOF; Pipelines; Odometry; Vectors; Context modeling; monocular visual odometry; transformer; video understanding;
D O I
10.1109/ACCESS.2025.3531667
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Estimating the camera's pose given images from a single camera is a traditional task in mobile robots and autonomous vehicles. This problem is called monocular visual odometry and often relies on geometric approaches that require considerable engineering effort for a specific scenario. Deep learning methods have been shown to be generalizable after proper training and with a large amount of available data. Transformer-based architectures have dominated the state-of-the-art in natural language processing and computer vision tasks, such as image and video understanding. In this work, we deal with the monocular visual odometry as a video understanding task to estimate the 6 degrees of freedom of a camera's pose. We contribute by presenting the TSformer-VO model based on spatio-temporal self-attention mechanisms to extract features from clips and estimate the motions in an end-to-end manner. Our approach achieved competitive state-of-the-art performance compared with geometry-based and deep learning-based methods on the KITTI visual odometry dataset, outperforming the DeepVO implementation highly accepted in the visual odometry community. The code is publicly available at https://github.com/aofrancani/TSformer-VO.
引用
收藏
页码:13959 / 13971
页数:13
相关论文
共 50 条
  • [31] Applying Edge AI towards Deep-learning-based Monocular Visual Odometry Model for Mobile Robotics
    Martins de Sousa, Frederico Luiz
    Silva, Mateus Coelho
    Rabelo Oliveira, Ricardo Augusto
    ICEIS: PROCEEDINGS OF THE 24TH INTERNATIONAL CONFERENCE ON ENTERPRISE INFORMATION SYSTEMS - VOL 1, 2022, : 561 - 568
  • [32] Bi-Modal Transformer-Based Approach for Visual Question Answering in Remote Sensing Imagery
    Bazi, Yakoub
    Al Rahhal, Mohamad Mahmoud
    Mekhalfi, Mohamed Lamine
    Al Zuair, Mansour Abdulaziz
    Melgani, Farid
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2022, 60
  • [33] Monocular Visual Odometry Based on Optical Flow and Feature Matching
    Cheng Chuanqi
    Hao Xiangyang
    Zhang Zhenjie
    Zhao Mandan
    2017 29TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2017, : 4554 - 4558
  • [34] A Visible-Thermal Fusion Based Monocular Visual Odometry
    Poujol, Julien
    Aguilera, Cristhian A.
    Danos, Etienne
    Vintimilla, Boris X.
    Toledo, Ricardo
    Sappa, Angel D.
    ROBOT 2015: SECOND IBERIAN ROBOTICS CONFERENCE: ADVANCES IN ROBOTICS, VOL 1, 2016, 417 : 517 - 528
  • [35] Sparse Transformer-Based Sequence Generation for Visual Object Tracking
    Tian, Dan
    Liu, Dong-Xin
    Wang, Xiao
    Hao, Ying
    IEEE ACCESS, 2024, 12 : 154418 - 154425
  • [36] Transformer-based deep learning model and video dataset for unsafe action identification in construction projects
    Yang, Meng
    Wu, Chengke
    Guo, Yuanjun
    Jiang, Rui
    Zhou, Feixiang
    Zhang, Jianlin
    Yang, Zhile
    AUTOMATION IN CONSTRUCTION, 2023, 146
  • [37] BEV-DWPVO: BEV-Based Differentiable Weighted Procrustes for Low Scale-Drift Monocular Visual Odometry on Ground
    Wei, Yufei
    Lu, Sha
    Lu, Wangtao
    Xiong, Rong
    Wang, Yue
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (05): : 4244 - 4251
  • [38] A Swin Transformer-Based Approach for Motorcycle Helmet Detection
    Bouhayane, Ayyoub
    Charouh, Zakaria
    Ghogho, Mounir
    Guennoun, Zouhair
    IEEE ACCESS, 2023, 11 : 74410 - 74419
  • [39] WFormer: A Transformer-Based Soft Fusion Model for Robust Image Watermarking
    Luo, Ting
    Wu, Jun
    He, Zhouyan
    Xu, Haiyong
    Jiang, Gangyi
    Chang, Chin-Chen
    IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024, : 4179 - 4196
  • [40] EchoBERT: A Transformer-Based Approach for Behavior Detection in Echograms
    Maloy, Hakon
    IEEE ACCESS, 2020, 8 : 218372 - 218385