SWformer-VO: A Monocular Visual Odometry Model Based on Swin Transformer

被引：3

作者：

Wu, Zhigang ^{[1
]}

Zhu, Yaohui ^{[1
]}

机构：

[1] Jiangxi Univ Sci & Technol, Sch Energy & Mech Engn, Nanchang 330013, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 05期

关键词：

Deep learning; monocular visual odometry; transformer; DEPTH;

D O I：

10.1109/LRA.2024.3384911

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

This letter introduces a novel monocular visual odometry network structure, leveraging the Swin Transformer as the backbone network, named SWformer-VO. It can directly estimate the six degrees of freedom camera pose under monocular camera conditions by utilizing a modest volume of image sequence data with an end-to-end methodology. SWformer-VO introduces an Embed module called "Mixture Embed", which fuses consecutive pairs of images into a single frame and converts them into tokens passed into the backbone network. This approach replaces traditional temporal sequence schemes by addressing the problem at the image level. Building upon this foundation, various parameters of the backbone network are continually improved and optimized. Additionally, experiments are conducted to explore the impact of different layers and depths of the backbone network on accuracy. Excitingly, on the KITTI dataset, SWformer-VO demonstrates superior accuracy compared with common deep learning-based methods such as SFMlearner, Deep-VO, TSformer-VO, Depth-VO-Feat, GeoNet, Masked Gans and others introduced in recent years. Moreover, the effectiveness of SWformer-VO is also validated on our self-collected dataset consisting of nine indoor corridor routes for visual odometry.

引用

页码：4766 / 4773

页数：8

共 50 条

[41] Deep learning based on LSTM model for enhanced visual odometry navigation system
Deraz, Ashraf A.
Badawy, Osama
Elhosseini, Mostafa A.
Mostafa, Mostafa
Ali, Hesham A.
El-Desouky, Ali I.
AIN SHAMS ENGINEERING JOURNAL, 2023, 14 (08)
[42] An Embedding Swin Transformer Model for Automatic Slow-Moving Landslide Detection Based on InSAR Products
Chen, Xuerong
Zhao, Chaoying
Liu, Xiaojie
Zhang, Shuangcheng
Xi, Jiangbo
Khan, Basit Ali
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2024, 62
[43] STEF: a Swin Transformer-Based Enhanced Feature Pyramid Fusion Model for Dongba character detection
Ma, Yuqi
Chen, Shanxiong
Li, Yongbo
He, Jingliu
Ruan, Qiuyue
Xiao, Wenjun
Xiong, Hailing
Li, Xiaoliang
HERITAGE SCIENCE, 2024, 12 (01):
[44] A lightweight deep learning model for TFT-LCD circuits defect classification based on swin transformer
Xia Y.
Luo C.
Zhou Y.
Jia L.
Guangxue Jingmi Gongcheng/Optics and Precision Engineering, 2023, 31 (22): : 3357 - 3370
[45] BEV-DWPVO: BEV-Based Differentiable Weighted Procrustes for Low Scale-Drift Monocular Visual Odometry on Ground
Wei, Yufei
Lu, Sha
Lu, Wangtao
Xiong, Rong
Wang, Yue
IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (05): : 4244 - 4251
[46] Single visual model based on transformer for digital instrument reading recognition
Li, Xiang
Zeng, Changchang
Yao, Yong
Zhang, Sen
Zhang, Haiding
Yang, Suixian
MEASUREMENT SCIENCE AND TECHNOLOGY, 2025, 36 (01)
[47] Monocular Visual Pig Weight Estimation Method Based on the EfficientVit-C Model
Wan, Songtai
Fang, Hui
Wang, Xiaoshuai
AGRICULTURE-BASEL, 2024, 14 (09):
[48] A Driver Behavior Detection Model for Human-Machine Co-Driving Systems Based on an Improved Swin Transformer
Cui, Junhua
Chen, Yunxing
Wu, Zhao
Wu, Huawei
Wu, Wanghao
WORLD ELECTRIC VEHICLE JOURNAL, 2025, 16 (01):
[49] Swin transformer based transfer learning model for predicting porous media permeability from 2D images
Geng, Shaoyang
Zhai, Shuo
Li, Chengyong
COMPUTERS AND GEOTECHNICS, 2024, 168
[50] Enhanced Magnetic Resonance Imaging-Based Brain Tumor Classification with a Hybrid Swin Transformer and ResNet50V2 Model
Al Bataineh, Abeer Fayez
Nahar, Khalid M. O.
Khafajeh, Hayel
Samara, Ghassan
Alazaidah, Raed
Nasayreh, Ahmad
Bashkami, Ayah
Gharaibeh, Hasan
Dawaghreh, Waed
APPLIED SCIENCES-BASEL, 2024, 14 (22):

← 1 2 3 4 5 →