SWformer-VO: A Monocular Visual Odometry Model Based on Swin Transformer

被引：3

作者：

Wu, Zhigang ^{[1
]}

Zhu, Yaohui ^{[1
]}

机构：

[1] Jiangxi Univ Sci & Technol, Sch Energy & Mech Engn, Nanchang 330013, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 05期

关键词：

Deep learning; monocular visual odometry; transformer; DEPTH;

D O I：

10.1109/LRA.2024.3384911

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

This letter introduces a novel monocular visual odometry network structure, leveraging the Swin Transformer as the backbone network, named SWformer-VO. It can directly estimate the six degrees of freedom camera pose under monocular camera conditions by utilizing a modest volume of image sequence data with an end-to-end methodology. SWformer-VO introduces an Embed module called "Mixture Embed", which fuses consecutive pairs of images into a single frame and converts them into tokens passed into the backbone network. This approach replaces traditional temporal sequence schemes by addressing the problem at the image level. Building upon this foundation, various parameters of the backbone network are continually improved and optimized. Additionally, experiments are conducted to explore the impact of different layers and depths of the backbone network on accuracy. Excitingly, on the KITTI dataset, SWformer-VO demonstrates superior accuracy compared with common deep learning-based methods such as SFMlearner, Deep-VO, TSformer-VO, Depth-VO-Feat, GeoNet, Masked Gans and others introduced in recent years. Moreover, the effectiveness of SWformer-VO is also validated on our self-collected dataset consisting of nine indoor corridor routes for visual odometry.

引用

页码：4766 / 4773

页数：8

共 50 条

[1] Transformer-Based Model for Monocular Visual Odometry: A Video Understanding Approach
Francani, Andre O.
Maximo, Marcos R. O. A.
IEEE ACCESS, 2025, 13 : 13959 - 13971
[2] Dense Prediction Transformer for Scale Estimation in Monocular Visual Odometry
Francani, Andre O.
Maximo, Marcos R. O. A.
2022 LATIN AMERICAN ROBOTICS SYMPOSIUM (LARS), 2022 BRAZILIAN SYMPOSIUM ON ROBOTICS (SBR), AND 2022 WORKSHOP ON ROBOTICS IN EDUCATION (WRE), 2022, : 312 - 317
[3] RAUM-VO: Rotational Adjusted Unsupervised Monocular Visual Odometry
Cimarelli, Claudio
Bavle, Hriday
Sanchez-Lopez, Jose Luis
Voos, Holger
SENSORS, 2022, 22 (07)
[4] Transformer Based Visual Inertial Odometry
Fei, Sicheng
Li, Jingfeng
Li, Lei
Liang, Jie
Hu, Jinwen
Zhang, Dingwen
Han, Junwei
ADVANCES IN GUIDANCE, NAVIGATION AND CONTROL, VOL 17, 2025, 1353 : 567 - 575
[5] Motion Consistency Loss for Monocular Visual Odometry with Attention-Based Deep Learning
Francani, Andre O.
Maximo, Marcos R. O. A.
2023 LATIN AMERICAN ROBOTICS SYMPOSIUM, LARS, 2023 BRAZILIAN SYMPOSIUM ON ROBOTICS, SBR, AND 2023 WORKSHOP ON ROBOTICS IN EDUCATION, WRE, 2023, : 409 - 414
[6] Monocular Visual Odometry based on Inverse Perspective Mapping
Cao Yu
Feng Ying
Yang Yun-tao
Chen Yun-jin
Lei Bing
Zhao Li-shuang
INTERNATIONAL SYMPOSIUM ON PHOTOELECTRONIC DETECTION AND IMAGING 2011: ADVANCES IN IMAGING DETECTORS AND APPLICATIONS, 2011, 8194
[7] A Comparison of Deep Learning-Based Monocular Visual Odometry Algorithms
Jeong, Eunju
Lee, Jaun
Kim, Pyojin
PROCEEDINGS OF THE 2021 ASIA-PACIFIC INTERNATIONAL SYMPOSIUM ON AEROSPACE TECHNOLOGY (APISAT 2021), VOL 2, 2023, 913 : 923 - 934
[8] Transformer guided geometry model for flow-based unsupervised visual odometry
Li, Xiangyu
Hou, Yonghong
Wang, Pichao
Gao, Zhimin
Xu, Mingliang
Li, Wanqing
NEURAL COMPUTING & APPLICATIONS, 2021, 33 (13) : 8031 - 8042
[9] Transformer guided geometry model for flow-based unsupervised visual odometry
Xiangyu Li
Yonghong Hou
Pichao Wang
Zhimin Gao
Mingliang Xu
Wanqing Li
Neural Computing and Applications, 2021, 33 : 8031 - 8042
[10] Monocular Visual Odometry Based on Recurrent Convolutional Neural Networks
Chen Z.
Hong Y.
Wang J.
Ge Z.
Jiqiren/Robot, 2019, 41 (02): : 147 - 155

← 1 2 3 4 5 →