Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

被引：0

作者：

Saavedra-Ruiz, Miguel ^{[1
]}

Morin, Sacha ^{[1
]}

Paull, Liam ^{[1
]}

机构：

[1] Univ Montreal, Mila Quebec AI Inst, DIRO, Montreal, PQ, Canada

来源：

2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022) | 2022年

关键词：

Vision Transformer; Image Segmentation; Visual Servoing;

D O I：

10.1109/CRV55824.2022.00033

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8x8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good singleimage segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.

引用

页码：197 / 204

页数：8

共 50 条

[21] Multi-scale vision transformer classification model with self-supervised learning and dilated convolution
Xing, Liping
Jin, Hongmei
Li, Hong-an
Li, Zhanli
COMPUTERS & ELECTRICAL ENGINEERING, 2022, 103
[22] Self-supervised vision transformer-based few-shot learning for facial expression recognition
Chen, Xuanchi
Zheng, Xiangwei
Sun, Kai
Liu, Weilong
Zhang, Yuang
INFORMATION SCIENCES, 2023, 634 : 206 - 226
[23] The Role of Masking for Efficient Supervised Knowledge Distillation of Vision Transformers
Son, Seungwoo
Ryu, Jegwang
Le, Namhoon
Lee, Jaeho
COMPUTER VISION - ECCV 2024, PT LXVII, 2025, 15125 : 379 - 396
[24] Identification of Dental Lesions Using Self-Supervised Vision Transformer in Radiographic X-ray Images
Li Y.
Zhao H.
Yang D.
Du S.
Cui X.
Zhang J.
Computer-Aided Design and Applications, 2024, 21 (S23): : 332 - 342
[25] ST-VTON: Self-supervised vision transformer for image-based virtual try-on
Chong, Zheng
Mo, Lingfei
IMAGE AND VISION COMPUTING, 2022, 127
[26] Self-Supervised Graph Transformer for Deepfake Detection
Khormali, Aminollah
Yuan, Jiann-Shiun
IEEE ACCESS, 2024, 12 : 58114 - 58127
[27] Automatic grasping control of mobile robot based on monocular vision
Yanqin Ma
Wenjun Zhu
Yuanwei Zhou
The International Journal of Advanced Manufacturing Technology, 2022, 121 : 1785 - 1798
[28] Vision guided navigation for a nonholonomic mobile robot
Ma, Y
Kosecká, J
Sastry, SS
IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, 1999, 15 (03): : 521 - 536
[29] Automatic grasping control of mobile robot based on monocular vision
Ma, Yanqin
Zhu, Wenjun
Zhou, Yuanwei
INTERNATIONAL JOURNAL OF ADVANCED MANUFACTURING TECHNOLOGY, 2022, 121 (3-4) : 1785 - 1798
[30] Contrastive-weighted self-supervised model for long-tailed data classification with vision transformer augmented
Hou, Rujie
Chen, Jinglong
Feng, Yong
Liu, Shen
He, Shuilong
Zhou, Zitong
MECHANICAL SYSTEMS AND SIGNAL PROCESSING, 2022, 177

← 1 2 3 4 5 →