SWformer-VO: A Monocular Visual Odometry Model Based on Swin Transformer

被引：3

作者：

Wu, Zhigang ^{[1
]}

Zhu, Yaohui ^{[1
]}

机构：

[1] Jiangxi Univ Sci & Technol, Sch Energy & Mech Engn, Nanchang 330013, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 05期

关键词：

Deep learning; monocular visual odometry; transformer; DEPTH;

D O I：

10.1109/LRA.2024.3384911

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

This letter introduces a novel monocular visual odometry network structure, leveraging the Swin Transformer as the backbone network, named SWformer-VO. It can directly estimate the six degrees of freedom camera pose under monocular camera conditions by utilizing a modest volume of image sequence data with an end-to-end methodology. SWformer-VO introduces an Embed module called "Mixture Embed", which fuses consecutive pairs of images into a single frame and converts them into tokens passed into the backbone network. This approach replaces traditional temporal sequence schemes by addressing the problem at the image level. Building upon this foundation, various parameters of the backbone network are continually improved and optimized. Additionally, experiments are conducted to explore the impact of different layers and depths of the backbone network on accuracy. Excitingly, on the KITTI dataset, SWformer-VO demonstrates superior accuracy compared with common deep learning-based methods such as SFMlearner, Deep-VO, TSformer-VO, Depth-VO-Feat, GeoNet, Masked Gans and others introduced in recent years. Moreover, the effectiveness of SWformer-VO is also validated on our self-collected dataset consisting of nine indoor corridor routes for visual odometry.

引用

页码：4766 / 4773

页数：8

共 50 条

[21] Effective Feature-Based Downward-Facing Monocular Visual Odometry
Lee, Hoyong
Lee, Hakjun
Kwak, Inveom
Sung, Chiwon
Han, Soohee
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2024, 32 (01) : 266 - 273
[22] Monocular visual odometry: A cross-spectral image fusion based approach
Sappa, Angel D.
Aguilera, Cristhian A.
Carvajal Ayala, Juan A.
Oliveira, Miguel
Romero, Dennis
Vintimilla, Boris X.
Toledo, Ricardo
ROBOTICS AND AUTONOMOUS SYSTEMS, 2016, 85 : 26 - 36
[23] TransDiff: medical image segmentation method based on Swin Transformer with diffusion probabilistic model
Liu, Xiaoxiao
Zhao, Yan
Wang, Shigang
Wei, Jian
APPLIED INTELLIGENCE, 2024, 54 (08) : 6543 - 6557
[24] Feasibility Study on Optical Image Modulation Based Parallax Generator for Monocular Visual Odometry
Lee, Minyoung
Cha, Moo Hyun
Park, Chan Seok
Kim, Kyung-Soo
2020 20TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2020, : 581 - 585
[25] Robust self-supervised monocular visual odometry based on prediction-update pose estimation network
Xiu, Haixin
Liang, Yiyou
Zeng, Hui
Li, Qing
Liu, Hongmin
Fan, Bin
Li, Chen
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 116
[26] Incorporating a Wheeled Vehicle Model in a New Monocular Visual Odometry Algorithm for Dynamic Outdoor Environments
Jiang, Yanhua
Xiong, Guangming
Chen, Huiyan
Lee, Dah-Jye
SENSORS, 2014, 14 (09) : 16159 - 16180
[27] Swin-FlowNet: Flow field oriented optimization aided by a CNN and Swin-Transformer based model
Wang, Xiao
Zou, Shufan
Jiang, Yi
Zhang, Laiping
Deng, Xiaogang
JOURNAL OF COMPUTATIONAL SCIENCE, 2023, 72
[28] Optimization of automatic classification for women’s pants based on the swin transformer model
Shaoqin Pan
Ping Wang
Chen Yang
Fashion and Textiles, 11 (1)
[29] Adaptive-search template matching technique based on vehicle acceleration for monocular visual odometry system
Aqel, Mohammad O. A.
Marhaban, Mohammad H.
Saripan, M. Iqbal
Ismail, Napsiah Bt.
IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2016, 11 (06) : 739 - 752
[30] Hybrid self-supervised monocular visual odometry system based on spatio-temporal features
Yuan, Shuangjie
Zhang, Jun
Lin, Yujia
Yang, Lu
ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (05): : 3543 - 3568

← 1 2 3 4 5 →