Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach

被引：1

作者：

Francani, Andre O. ^{[1
]}

Maximo, Marcos R. O. A. ^{[1
]}

机构：

[1] Aeronaut Inst Technol, Autonomous Computat Syst Lab, BR-12228900 Sao Jose Dos Campos, SP, Brazil

来源：

IEEE ACCESS | 2025年 / 13卷

关键词：

Transformers; Visual odometry; Feature extraction; Deep learning; Computer architecture; 6-DOF; Pipelines; Odometry; Vectors; Context modeling; monocular visual odometry; transformer; video understanding;

D O I：

10.1109/ACCESS.2025.3531667

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Estimating the camera's pose given images from a single camera is a traditional task in mobile robots and autonomous vehicles. This problem is called monocular visual odometry and often relies on geometric approaches that require considerable engineering effort for a specific scenario. Deep learning methods have been shown to be generalizable after proper training and with a large amount of available data. Transformer-based architectures have dominated the state-of-the-art in natural language processing and computer vision tasks, such as image and video understanding. In this work, we deal with the monocular visual odometry as a video understanding task to estimate the 6 degrees of freedom of a camera's pose. We contribute by presenting the TSformer-VO model based on spatio-temporal self-attention mechanisms to extract features from clips and estimate the motions in an end-to-end manner. Our approach achieved competitive state-of-the-art performance compared with geometry-based and deep learning-based methods on the KITTI visual odometry dataset, outperforming the DeepVO implementation highly accepted in the visual odometry community. The code is publicly available at https://github.com/aofrancani/TSformer-VO.

引用

页码：13959 / 13971

页数：13

共 50 条

[1] SWformer-VO: A Monocular Visual Odometry Model Based on Swin Transformer
Wu, Zhigang
Zhu, Yaohui
IEEE ROBOTICS AND AUTOMATION LETTERS, 2024, 9 (05) : 4766 - 4773
[2] Dense Prediction Transformer for Scale Estimation in Monocular Visual Odometry
Francani, Andre O.
Maximo, Marcos R. O. A.
2022 LATIN AMERICAN ROBOTICS SYMPOSIUM (LARS), 2022 BRAZILIAN SYMPOSIUM ON ROBOTICS (SBR), AND 2022 WORKSHOP ON ROBOTICS IN EDUCATION (WRE), 2022, : 312 - 317
[3] From Local Understanding to Global Regression in Monocular Visual Odometry
Esfahani, Mandi Abolfazli
Wu, Keyu
Yuan, Shenghai
Wang, Han
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2020, 34 (01)
[4] Unsupervised Monocular Visual Odometry Based on Confidence Evaluation
Liu, Yiling
Wang, Hesheng
Wang, Jingchuan
Wang, Xinlei
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (06) : 5387 - 5396
[5] Monocular Visual Odometry Based on Hybrid Parameterization
Mohamed, Sherif A. S.
Haghbayan, Mohammad-Hashem
Heikkonen, Jukka
Tenhunen, Hannu
Plosila, Juha
TWELFTH INTERNATIONAL CONFERENCE ON MACHINE VISION (ICMV 2019), 2020, 11433
[6] Transformer guided geometry model for flow-based unsupervised visual odometry
Li, Xiangyu
Hou, Yonghong
Wang, Pichao
Gao, Zhimin
Xu, Mingliang
Li, Wanqing
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (13) : 8031 - 8042
[7] Transformer guided geometry model for flow-based unsupervised visual odometry
Xiangyu Li
Yonghong Hou
Pichao Wang
Zhimin Gao
Mingliang Xu
Wanqing Li
Neural Computing and Applications, 2021, 33 : 8031 - 8042
[8] A Novel Approach to Improve the Precision of Monocular Visual Odometry
Xiao, Chen
Zhu, Xiaorui
Feng, Wei
Ou, Yongsheng
2015 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION, 2015, : 392 - 397
[9] A New Approach to Train Convolutional Neural Networks for Monocular Visual Odometry
Esfahani, Mandi Abolfazli
Wu, Keyu
Yuan, Shenghai
Wang, Han
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE (ICPRAI 2018), 2018, : 66 - 71
[10] Transformer-Based Seismic Image Enhancement: A Novel Approach for Improved Resolution
Park, Jin-Yeong
Saad, Omar M.
Oh, Ju-Won
Alkhalifah, Tariq
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2025, 63

← 1 2 3 4 5 →