Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach

被引:1
|
作者
Francani, Andre O. [1 ]
Maximo, Marcos R. O. A. [1 ]
机构
[1] Aeronaut Inst Technol, Autonomous Computat Syst Lab, BR-12228900 Sao Jose Dos Campos, SP, Brazil
来源
IEEE ACCESS | 2025年 / 13卷
关键词
Transformers; Visual odometry; Feature extraction; Deep learning; Computer architecture; 6-DOF; Pipelines; Odometry; Vectors; Context modeling; monocular visual odometry; transformer; video understanding;
D O I
10.1109/ACCESS.2025.3531667
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Estimating the camera's pose given images from a single camera is a traditional task in mobile robots and autonomous vehicles. This problem is called monocular visual odometry and often relies on geometric approaches that require considerable engineering effort for a specific scenario. Deep learning methods have been shown to be generalizable after proper training and with a large amount of available data. Transformer-based architectures have dominated the state-of-the-art in natural language processing and computer vision tasks, such as image and video understanding. In this work, we deal with the monocular visual odometry as a video understanding task to estimate the 6 degrees of freedom of a camera's pose. We contribute by presenting the TSformer-VO model based on spatio-temporal self-attention mechanisms to extract features from clips and estimate the motions in an end-to-end manner. Our approach achieved competitive state-of-the-art performance compared with geometry-based and deep learning-based methods on the KITTI visual odometry dataset, outperforming the DeepVO implementation highly accepted in the visual odometry community. The code is publicly available at https://github.com/aofrancani/TSformer-VO.
引用
收藏
页码:13959 / 13971
页数:13
相关论文
共 50 条
  • [21] RTIDS: A Robust Transformer-Based Approach for Intrusion Detection System
    Wu, Zihan
    Zhang, Hong
    Wang, Penghai
    Sun, Zhibo
    IEEE ACCESS, 2022, 10 : 64375 - 64387
  • [22] Transformer-Based Approach for Automatic Semantic Financial Document Verification
    Toprak, Ahmet
    Turan, Metin
    IEEE ACCESS, 2024, 12 : 184327 - 184349
  • [23] Transformer-Based Visual Segmentation: A Survey
    Li, Xiangtai
    Ding, Henghui
    Yuan, Haobo
    Zhang, Wenwei
    Pang, Jiangmiao
    Cheng, Guangliang
    Chen, Kai
    Liu, Ziwei
    Loy, Chen Change
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (12) : 10138 - 10163
  • [24] A Sparse Transformer-Based Approach for Image Captioning
    Lei, Zhou
    Zhou, Congcong
    Chen, Shengbo
    Huang, Yiyong
    Liu, Xianrui
    IEEE ACCESS, 2020, 8 : 213437 - 213446
  • [25] Bangla-BERT: Transformer-Based Efficient Model for Transfer Learning and Language Understanding
    Kowsher, M.
    Sami, Abdullah A. S.
    Prottasha, Nusrat Jahan
    Arefin, Mohammad Shamsul
    Dhar, Pranab Kumar
    Koshiba, Takeshi
    IEEE ACCESS, 2022, 10 : 91855 - 91870
  • [26] Exploiting Optical Flow Guidance for Transformer-Based Video Inpainting
    Zhang, Kaidong
    Peng, Jialun
    Fu, Jingjing
    Liu, Dong
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2024, 46 (07) : 4977 - 4992
  • [27] Generative Video Compression with a Transformer-Based Discriminator
    Du, Pengli
    Liu, Ying
    Ling, Nam
    Ren, Yongxiong
    Liu, Lingzhi
    2022 PICTURE CODING SYMPOSIUM (PCS), 2022, : 349 - 353
  • [28] TransFusionOdom: Transformer-Based LiDAR-Inertial Fusion Odometry Estimation
    Sun, Leyuan
    Ding, Guanqun
    Qiu, Yue
    Yoshiyasu, Yusuke
    Kanehiro, Fumio
    IEEE SENSORS JOURNAL, 2023, 23 (18) : 22064 - 22079
  • [29] Monocular Visual Odometry Based on Recurrent Convolutional Neural Networks
    Chen Z.
    Hong Y.
    Wang J.
    Ge Z.
    Jiqiren/Robot, 2019, 41 (02): : 147 - 155
  • [30] Monocular Visual Odometry Based on Homogeneous SURF Feature Points
    Si, Zengxiu
    Wu, Xinhua
    Liu, Gang
    5TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTER SCIENCE APPLICATIONS AND TECHNOLOGIES (ACSAT 2017), 2017, : 10 - 17