Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach

被引：1

作者：

Francani, Andre O. ^{[1
]}

Maximo, Marcos R. O. A. ^{[1
]}

机构：

[1] Aeronaut Inst Technol, Autonomous Computat Syst Lab, BR-12228900 Sao Jose Dos Campos, SP, Brazil

来源：

IEEE ACCESS | 2025年 / 13卷

关键词：

Transformers; Visual odometry; Feature extraction; Deep learning; Computer architecture; 6-DOF; Pipelines; Odometry; Vectors; Context modeling; monocular visual odometry; transformer; video understanding;

D O I：

10.1109/ACCESS.2025.3531667

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Estimating the camera's pose given images from a single camera is a traditional task in mobile robots and autonomous vehicles. This problem is called monocular visual odometry and often relies on geometric approaches that require considerable engineering effort for a specific scenario. Deep learning methods have been shown to be generalizable after proper training and with a large amount of available data. Transformer-based architectures have dominated the state-of-the-art in natural language processing and computer vision tasks, such as image and video understanding. In this work, we deal with the monocular visual odometry as a video understanding task to estimate the 6 degrees of freedom of a camera's pose. We contribute by presenting the TSformer-VO model based on spatio-temporal self-attention mechanisms to extract features from clips and estimate the motions in an end-to-end manner. Our approach achieved competitive state-of-the-art performance compared with geometry-based and deep learning-based methods on the KITTI visual odometry dataset, outperforming the DeepVO implementation highly accepted in the visual odometry community. The code is publicly available at https://github.com/aofrancani/TSformer-VO.

引用

页码：13959 / 13971

页数：13

共 50 条

[41] Monocular Non-linear Photometric Transformation Visual Odometry Based on Direct Sparse Odometry
Yuan, Junyi
Hirota, Kaoru
Zhang, Zelong
Dai, Yaping
2023 35TH CHINESE CONTROL AND DECISION CONFERENCE, CCDC, 2023, : 2682 - 2687
[42] Transformer-based Natural Language Understanding and Generation
Zhang, Feng
An, Gaoyun
Ruan, Qiuqi
2022 16TH IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING (ICSP2022), VOL 1, 2022, : 281 - 284
[43] FRVO-Mono: Feature-Based Railway Visual Odometry With Monocular Camera
Huang, Kaicong
Shen, Yanlong
Chen, Jiejun
Wang, Liang
Wang, Shengchun
Dai, Peng
IEEE TRANSACTIONS ON INSTRUMENTATION AND MEASUREMENT, 2023, 72
[44] Overview of Transformer-Based Visual Segmentation Techniques
Li, Wen-Sheng
Zhang, Jing
Zhuo, Li
Wu, Xin-Jia
Yan, Yi
Jisuanji Xuebao/Chinese Journal of Computers, 2024, 47 (12): : 2760 - 2782
[45] Transformer-based approach to variable typing
Rey, Charles Arthel
Danguilan, Jose Lorenzo
Mendoza, Karl Patrick
Remolona, Miguel Francisco
HELIYON, 2023, 9 (10)
[46] Sensing-Assisted High Reliable Communication: A Transformer-Based Beamforming Approach
Cui, Yuanhao
Nie, Jiali
Cao, Xiaowen
Yu, Tiankuo
Zou, Jiaqi
Mu, Junsheng
Jing, Xiaojun
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2024, 18 (05) : 782 - 795
[47] A Lightweight Transformer-Based Approach of Specific Emitter Identification for the Automatic Identification System
Deng, Pengfei
Hong, Shaohua
Qi, Jie
Wang, Lin
Sun, Haixin
IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2023, 18 : 2303 - 2317
[48] ETR: Enhancing Taillight Recognition via Transformer-Based Video Classification
Zhou, Jiakai
Yang, Jun
Wu, Xiaoliang
Zhou, Wanlin
Wang, Yang
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2025, 26 (02) : 2721 - 2733
[49] An Unsupervised Monocular Visual Odometry Based on Multi-Scale Modeling
Zhi, Henghui
Yin, Chenyang
Li, Huibin
Pang, Shanmin
SENSORS, 2022, 22 (14)
[50] Convolutional Transformer-Based Cross Subject Model for SSVEP-Based BCI Classification
Liu, Jiawei
Wang, Ruimin
Yang, Yuankui
Zong, Yuan
Leng, Yue
Zheng, Wenming
Ge, Sheng
IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, 2024, 28 (11) : 6581 - 6593

← 1 2 3 4 5 →