Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

被引：0

作者：

Saavedra-Ruiz, Miguel ^{[1
]}

Morin, Sacha ^{[1
]}

Paull, Liam ^{[1
]}

机构：

[1] Univ Montreal, Mila Quebec AI Inst, DIRO, Montreal, PQ, Canada

来源：

2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022) | 2022年

关键词：

Vision Transformer; Image Segmentation; Visual Servoing;

D O I：

10.1109/CRV55824.2022.00033

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8x8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good singleimage segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.

引用

页码：197 / 204

页数：8

共 50 条

[41] On the uncertainty of self-supervised monocular depth estimation
Poggi, Matteo
Aleotti, Filippo
Tosi, Fabio
Mattoccia, Stefano
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2020, : 3224 - 3234
[42] Revisiting Self-supervised Monocular Depth Estimation
Kim, Ue-Hwan
Lee, Gyeong-Min
Kim, Jong-Hwan
ROBOT INTELLIGENCE TECHNOLOGY AND APPLICATIONS 6, 2022, 429 : 336 - 350
[43] Research on self-supervised depth estimation algorithm of driving scene based on monocular vision
Zhengchun Xie
Su Zhou
Miao Zheng
Fenglai Pei
Signal, Image and Video Processing, 2023, 17 : 991 - 999
[44] Research on self-supervised depth estimation algorithm of driving scene based on monocular vision
Xie, Zhengchun
Zhou, Su
Zheng, Miao
Pei, Fenglai
SIGNAL IMAGE AND VIDEO PROCESSING, 2023, 17 (04) : 991 - 999
[45] Multi-Grounding Navigator for Self-Supervised Vision-and-Language Navigation
Wu, Zongkai
Liu, Zihan
Wang, Donglin
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[46] VISION-BASED HUMANOID NAVIGATION USING SELF-SUPERVISED OBSTACLE DETECTION
Maier, Daniel
Stachniss, Cyrill
Bennewitz, Maren
INTERNATIONAL JOURNAL OF HUMANOID ROBOTICS, 2013, 10 (02)
[47] Self-supervised Vision Transformers for 3D pose estimation of novel objects
Thalhammer, Stefan
Weibel, Jean-Baptiste
Vincze, Markus
Garcia-Rodriguez, Jose
IMAGE AND VISION COMPUTING, 2023, 139
[48] PROPERTY NEURONS IN SELF-SUPERVISED SPEECH TRANSFORMERS
Lin, Tzu-Quan
Lin, Guan-Ting
Lee, Hung-Yi
Tang, Hao
arXiv,
[49] Adapting Self-Supervised Vision Transformers by Probing Attention-Conditioned Masking Consistency
Prabhu, Viraj
Yenamandra, Sriram
Singh, Aaditya
Hoffman, Judy
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
[50] Self-supervised learning of Vision Transformers for digital soil mapping using visual data
Tresson, Paul
Dumont, Maxime
Jaeger, Marc
Borne, Frederic
Boivin, Stephane
Marie-Louise, Loic
Francois, Jeremie
Boukcim, Hassan
Goeau, Herve
GEODERMA, 2024, 450

← 1 2 3 4 5 →