SWformer-VO: A Monocular Visual Odometry Model Based on Swin Transformer

被引:3
|
作者
Wu, Zhigang [1 ]
Zhu, Yaohui [1 ]
机构
[1] Jiangxi Univ Sci & Technol, Sch Energy & Mech Engn, Nanchang 330013, Peoples R China
关键词
Deep learning; monocular visual odometry; transformer; DEPTH;
D O I
10.1109/LRA.2024.3384911
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
This letter introduces a novel monocular visual odometry network structure, leveraging the Swin Transformer as the backbone network, named SWformer-VO. It can directly estimate the six degrees of freedom camera pose under monocular camera conditions by utilizing a modest volume of image sequence data with an end-to-end methodology. SWformer-VO introduces an Embed module called "Mixture Embed", which fuses consecutive pairs of images into a single frame and converts them into tokens passed into the backbone network. This approach replaces traditional temporal sequence schemes by addressing the problem at the image level. Building upon this foundation, various parameters of the backbone network are continually improved and optimized. Additionally, experiments are conducted to explore the impact of different layers and depths of the backbone network on accuracy. Excitingly, on the KITTI dataset, SWformer-VO demonstrates superior accuracy compared with common deep learning-based methods such as SFMlearner, Deep-VO, TSformer-VO, Depth-VO-Feat, GeoNet, Masked Gans and others introduced in recent years. Moreover, the effectiveness of SWformer-VO is also validated on our self-collected dataset consisting of nine indoor corridor routes for visual odometry.
引用
收藏
页码:4766 / 4773
页数:8
相关论文
共 50 条
  • [21] Effective Feature-Based Downward-Facing Monocular Visual Odometry
    Lee, Hoyong
    Lee, Hakjun
    Kwak, Inveom
    Sung, Chiwon
    Han, Soohee
    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2024, 32 (01) : 266 - 273
  • [22] Monocular visual odometry: A cross-spectral image fusion based approach
    Sappa, Angel D.
    Aguilera, Cristhian A.
    Carvajal Ayala, Juan A.
    Oliveira, Miguel
    Romero, Dennis
    Vintimilla, Boris X.
    Toledo, Ricardo
    ROBOTICS AND AUTONOMOUS SYSTEMS, 2016, 85 : 26 - 36
  • [23] TransDiff: medical image segmentation method based on Swin Transformer with diffusion probabilistic model
    Liu, Xiaoxiao
    Zhao, Yan
    Wang, Shigang
    Wei, Jian
    APPLIED INTELLIGENCE, 2024, 54 (08) : 6543 - 6557
  • [24] Feasibility Study on Optical Image Modulation Based Parallax Generator for Monocular Visual Odometry
    Lee, Minyoung
    Cha, Moo Hyun
    Park, Chan Seok
    Kim, Kyung-Soo
    2020 20TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2020, : 581 - 585
  • [25] Robust self-supervised monocular visual odometry based on prediction-update pose estimation network
    Xiu, Haixin
    Liang, Yiyou
    Zeng, Hui
    Li, Qing
    Liu, Hongmin
    Fan, Bin
    Li, Chen
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2022, 116
  • [26] Incorporating a Wheeled Vehicle Model in a New Monocular Visual Odometry Algorithm for Dynamic Outdoor Environments
    Jiang, Yanhua
    Xiong, Guangming
    Chen, Huiyan
    Lee, Dah-Jye
    SENSORS, 2014, 14 (09) : 16159 - 16180
  • [27] Swin-FlowNet: Flow field oriented optimization aided by a CNN and Swin-Transformer based model
    Wang, Xiao
    Zou, Shufan
    Jiang, Yi
    Zhang, Laiping
    Deng, Xiaogang
    JOURNAL OF COMPUTATIONAL SCIENCE, 2023, 72
  • [28] Optimization of automatic classification for women’s pants based on the swin transformer model
    Shaoqin Pan
    Ping Wang
    Chen Yang
    Fashion and Textiles, 11 (1)
  • [29] Adaptive-search template matching technique based on vehicle acceleration for monocular visual odometry system
    Aqel, Mohammad O. A.
    Marhaban, Mohammad H.
    Saripan, M. Iqbal
    Ismail, Napsiah Bt.
    IEEJ TRANSACTIONS ON ELECTRICAL AND ELECTRONIC ENGINEERING, 2016, 11 (06) : 739 - 752
  • [30] Hybrid self-supervised monocular visual odometry system based on spatio-temporal features
    Yuan, Shuangjie
    Zhang, Jun
    Lin, Yujia
    Yang, Lu
    ELECTRONIC RESEARCH ARCHIVE, 2024, 32 (05): : 3543 - 3568