Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach

被引:1
|
作者
Francani, Andre O. [1 ]
Maximo, Marcos R. O. A. [1 ]
机构
[1] Aeronaut Inst Technol, Autonomous Computat Syst Lab, BR-12228900 Sao Jose Dos Campos, SP, Brazil
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Transformers; Visual odometry; Feature extraction; Deep learning; Computer architecture; 6-DOF; Pipelines; Odometry; Vectors; Context modeling; monocular visual odometry; transformer; video understanding;
D O I
10.1109/ACCESS.2025.3531667
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Estimating the camera's pose given images from a single camera is a traditional task in mobile robots and autonomous vehicles. This problem is called monocular visual odometry and often relies on geometric approaches that require considerable engineering effort for a specific scenario. Deep learning methods have been shown to be generalizable after proper training and with a large amount of available data. Transformer-based architectures have dominated the state-of-the-art in natural language processing and computer vision tasks, such as image and video understanding. In this work, we deal with the monocular visual odometry as a video understanding task to estimate the 6 degrees of freedom of a camera's pose. We contribute by presenting the TSformer-VO model based on spatio-temporal self-attention mechanisms to extract features from clips and estimate the motions in an end-to-end manner. Our approach achieved competitive state-of-the-art performance compared with geometry-based and deep learning-based methods on the KITTI visual odometry dataset, outperforming the DeepVO implementation highly accepted in the visual odometry community. The code is publicly available at https://github.com/aofrancani/TSformer-VO.
引用
收藏
页码:13959 / 13971
页数:13
相关论文
共 50 条
  • [41] Monocular Non-linear Photometric Transformation Visual Odometry Based on Direct Sparse Odometry
    Yuan, Junyi
    Hirota, Kaoru
    Zhang, Zelong
    Dai, Yaping
    2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 2682 - 2687
  • [42] Transformer-based Natural Language Understanding and Generation
    Zhang, Feng
    An, Gaoyun
    Ruan, Qiuqi
    2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 281 - 284
  • [43] FRVO-Mono: Feature-Based Railway Visual Odometry With Monocular Camera
    Huang, Kaicong
    Shen, Yanlong
    Chen, Jiejun
    Wang, Liang
    Wang, Shengchun
    Dai, Peng
    IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
  • [44] Overview of Transformer-Based Visual Segmentation Techniques
    Li, Wen-Sheng
    Zhang, Jing
    Zhuo, Li
    Wu, Xin-Jia
    Yan, Yi
    Jisuanji Xuebao/Chinese Journal of Computers, 2024, 47 (12): : 2760 - 2782
  • [45] Transformer-based approach to variable typing
    Rey, Charles Arthel
    Danguilan, Jose Lorenzo
    Mendoza, Karl Patrick
    Remolona, Miguel Francisco
    HELIYON, 2023, 9 (10)
  • [46] Sensing-Assisted High Reliable Communication: A Transformer-Based Beamforming Approach
    Cui, Yuanhao
    Nie, Jiali
    Cao, Xiaowen
    Yu, Tiankuo
    Zou, Jiaqi
    Mu, Junsheng
    Jing, Xiaojun
    IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (05) : 782 - 795
  • [47] A Lightweight Transformer-Based Approach of Specific Emitter Identification for the Automatic Identification System
    Deng, Pengfei
    Hong, Shaohua
    Qi, Jie
    Wang, Lin
    Sun, Haixin
    IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 2303 - 2317
  • [48] ETR: Enhancing Taillight Recognition via Transformer-Based Video Classification
    Zhou, Jiakai
    Yang, Jun
    Wu, Xiaoliang
    Zhou, Wanlin
    Wang, Yang
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (02) : 2721 - 2733
  • [49] An Unsupervised Monocular Visual Odometry Based on Multi-Scale Modeling
    Zhi, Henghui
    Yin, Chenyang
    Li, Huibin
    Pang, Shanmin
    SENSORS, 2022, 22 (14)
  • [50] Convolutional Transformer-Based Cross Subject Model for SSVEP-Based BCI Classification
    Liu, Jiawei
    Wang, Ruimin
    Yang, Yuankui
    Zong, Yuan
    Leng, Yue
    Zheng, Wenming
    Ge, Sheng
    IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (11) : 6581 - 6593