Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

被引:0
|
作者
Saavedra-Ruiz, Miguel [1 ]
Morin, Sacha [1 ]
Paull, Liam [1 ]
机构
[1] Univ Montreal, Mila Quebec AI Inst, DIRO, Montreal, PQ, Canada
来源
2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022) | 2022年
关键词
Vision Transformer; Image Segmentation; Visual Servoing;
D O I
10.1109/CRV55824.2022.00033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8x8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good singleimage segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.
引用
收藏
页码:197 / 204
页数:8
相关论文
共 50 条
  • [1] Self-supervised vision transformers for semantic segmentation
    Gu, Xianfan
    Hu, Yingdong
    Wen, Chuan
    Gao, Yang
    COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
  • [2] Self-supervised Vision Transformers for Writer Retrieval
    Raven, Tim
    Matei, Arthur
    Fink, Gernot A.
    DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 380 - 396
  • [3] Exploring Self-Supervised Vision Transformers for Gait Recognition in the Wild
    Cosma, Adrian
    Catruna, Andy
    Radoi, Emilian
    SENSORS, 2023, 23 (05)
  • [4] Self-Supervised Augmented Vision Transformers for Remote Physiological Measurement
    Pang, Liyu
    Li, Xiaoou
    Wang, Zhen
    Lei, Xueyi
    Pei, Yulong
    2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 623 - 627
  • [5] Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers
    Hu, Hao
    Baldassarre, Federico
    Azizpour, Hossein
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III, 2023, 13715 : 409 - 426
  • [6] SELF-SUPERVISED VISION TRANSFORMERS FOR JOINT SAR-OPTICAL REPRESENTATION LEARNING
    Wang, Yi
    Albrecht, Conrad M.
    Zhu, Xiao Xiang
    2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 139 - 142
  • [7] Self-supervised Vision Transformers for 3D pose estimation of novel objects
    Thalhammer, Stefan
    Weibel, Jean-Baptiste
    Vincze, Markus
    Garcia-Rodriguez, Jose
    IMAGE AND VISION COMPUTING, 2023, 139
  • [8] A Cross-Domain Threat Screening and Localization Framework Using Vision Transformers and Self-supervised Learning
    Nasim, Ammara
    Akram, Muhammad Usman
    Khan, Asad Mansoor
    Khan, Muhammad Belal Afsar
    Hassan, Taimur
    2024 14TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION SYSTEMS, ICPRS, 2024,
  • [9] Self-Supervised Domain Adaptation for Computer Vision Tasks
    Xu, Jiaolong
    Xiao, Liang
    Lopez, Antonio M.
    IEEE ACCESS, 2019, 7 : 156694 - 156706
  • [10] Perceptual Hashing Using Pretrained Vision Transformers
    De Geest, Jelle
    De Smet, Patrick
    Bonetto, Lucio
    Lambert, Peter
    Van Wallendael, Glenn
    Mareen, Hannes
    2024 IEEE GAMING, ENTERTAINMENT, AND MEDIA CONFERENCE, GEM 2024, 2024, : 19 - 24