Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

被引:0
|
作者
Saavedra-Ruiz, Miguel [1 ]
Morin, Sacha [1 ]
Paull, Liam [1 ]
机构
[1] Univ Montreal, Mila Quebec AI Inst, DIRO, Montreal, PQ, Canada
来源
2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022) | 2022年
关键词
Vision Transformer; Image Segmentation; Visual Servoing;
D O I
10.1109/CRV55824.2022.00033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8x8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good singleimage segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.
引用
收藏
页码:197 / 204
页数:8
相关论文
共 50 条
  • [21] Multi-scale vision transformer classification model with self-supervised learning and dilated convolution
    Xing, Liping
    Jin, Hongmei
    Li, Hong-an
    Li, Zhanli
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 103
  • [22] Self-supervised vision transformer-based few-shot learning for facial expression recognition
    Chen, Xuanchi
    Zheng, Xiangwei
    Sun, Kai
    Liu, Weilong
    Zhang, Yuang
    INFORMATION SCIENCES, 2023, 634 : 206 - 226
  • [23] The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers
    Son, Seungwoo
    Ryu, Jegwang
    Le, Namhoon
    Lee, Jaeho
    COMPUTER VISION - ECCV 2024, PT LXVII, 2025, 15125 : 379 - 396
  • [24] Identification of Dental Lesions Using Self-Supervised Vision Transformer in Radiographic X-ray Images
    Li Y.
    Zhao H.
    Yang D.
    Du S.
    Cui X.
    Zhang J.
    Computer-Aided Design and Applications, 2024, 21 (S23): : 332 - 342
  • [25] ST-VTON: Self-supervised vision transformer for image-based virtual try-on
    Chong, Zheng
    Mo, Lingfei
    IMAGE AND VISION COMPUTING, 2022, 127
  • [26] Self-Supervised Graph Transformer for Deepfake Detection
    Khormali, Aminollah
    Yuan, Jiann-Shiun
    IEEE ACCESS, 2024, 12 : 58114 - 58127
  • [27] Automatic grasping control of mobile robot based on monocular vision
    Yanqin Ma
    Wenjun Zhu
    Yuanwei Zhou
    The International Journal of Advanced Manufacturing Technology, 2022, 121 : 1785 - 1798
  • [28] Vision guided navigation for a nonholonomic mobile robot
    Ma, Y
    Kosecká, J
    Sastry, SS
    IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 1999, 15 (03): : 521 - 536
  • [29] Automatic grasping control of mobile robot based on monocular vision
    Ma, Yanqin
    Zhu, Wenjun
    Zhou, Yuanwei
    INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2022, 121 (3-4) : 1785 - 1798
  • [30] Contrastive-weighted self-supervised model for long-tailed data classification with vision transformer augmented
    Hou, Rujie
    Chen, Jinglong
    Feng, Yong
    Liu, Shen
    He, Shuilong
    Zhou, Zitong
    MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2022, 177