Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

被引:0
|
作者
Saavedra-Ruiz, Miguel [1 ]
Morin, Sacha [1 ]
Paull, Liam [1 ]
机构
[1] Univ Montreal, Mila Quebec AI Inst, DIRO, Montreal, PQ, Canada
关键词
Vision Transformer; Image Segmentation; Visual Servoing;
D O I
10.1109/CRV55824.2022.00033
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8x8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good singleimage segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.
引用
收藏
页码:197 / 204
页数:8
相关论文
共 50 条
  • [41] On the uncertainty of self-supervised monocular depth estimation
    Poggi, Matteo
    Aleotti, Filippo
    Tosi, Fabio
    Mattoccia, Stefano
    2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3224 - 3234
  • [42] Revisiting Self-supervised Monocular Depth Estimation
    Kim, Ue-Hwan
    Lee, Gyeong-Min
    Kim, Jong-Hwan
    ROBOT INTELLIGENCE TECHNOLOGY AND APPLICATIONS 6, 2022, 429 : 336 - 350
  • [43] Research on self-supervised depth estimation algorithm of driving scene based on monocular vision
    Zhengchun Xie
    Su Zhou
    Miao Zheng
    Fenglai Pei
    Signal, Image and Video Processing, 2023, 17 : 991 - 999
  • [44] Research on self-supervised depth estimation algorithm of driving scene based on monocular vision
    Xie, Zhengchun
    Zhou, Su
    Zheng, Miao
    Pei, Fenglai
    SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 991 - 999
  • [45] Multi-Grounding Navigator for Self-Supervised Vision-and-Language Navigation
    Wu, Zongkai
    Liu, Zihan
    Wang, Donglin
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [46] VISION-BASED HUMANOID NAVIGATION USING SELF-SUPERVISED OBSTACLE DETECTION
    Maier, Daniel
    Stachniss, Cyrill
    Bennewitz, Maren
    INTERNATIONAL JOURNAL OF HUMANOID ROBOTICS, 2013, 10 (02)
  • [47] Self-supervised Vision Transformers for 3D pose estimation of novel objects
    Thalhammer, Stefan
    Weibel, Jean-Baptiste
    Vincze, Markus
    Garcia-Rodriguez, Jose
    IMAGE AND VISION COMPUTING, 2023, 139
  • [48] PROPERTY NEURONS IN SELF-SUPERVISED SPEECH TRANSFORMERS
    Lin, Tzu-Quan
    Lin, Guan-Ting
    Lee, Hung-Yi
    Tang, Hao
    arXiv,
  • [49] Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency
    Prabhu, Viraj
    Yenamandra, Sriram
    Singh, Aaditya
    Hoffman, Judy
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [50] Self-supervised learning of Vision Transformers for digital soil mapping using visual data
    Tresson, Paul
    Dumont, Maxime
    Jaeger, Marc
    Borne, Frederic
    Boivin, Stephane
    Marie-Louise, Loic
    Francois, Jeremie
    Boukcim, Hassan
    Goeau, Herve
    GEODERMA, 2024, 450