Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers

被引：0

作者：

Saavedra-Ruiz, Miguel ^{[1
]}

Morin, Sacha ^{[1
]}

Paull, Liam ^{[1
]}

机构：

[1] Univ Montreal, Mila Quebec AI Inst, DIRO, Montreal, PQ, Canada

来源：

2022 19TH CONFERENCE ON ROBOTS AND VISION (CRV 2022) | 2022年

关键词：

Vision Transformer; Image Segmentation; Visual Servoing;

D O I：

10.1109/CRV55824.2022.00033

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this work, we consider the problem of learning a perception model for monocular robot navigation using few annotated images. Using a Vision Transformer (ViT) pretrained with a label-free self-supervised method, we successfully train a coarse image segmentation model for the Duckietown environment using 70 training images. Our model performs coarse image segmentation at the 8x8 patch level, and the inference resolution can be adjusted to balance prediction granularity and real-time perception constraints. We study how best to adapt a ViT to our task and environment, and find that some lightweight architectures can yield good singleimage segmentations at a usable frame rate, even on CPU. The resulting perception model is used as the backbone for a simple yet robust visual servoing agent, which we deploy on a differential drive mobile robot to perform two tasks: lane following and obstacle avoidance.

引用

页码：197 / 204

页数：8

共 50 条

[1] Self-supervised vision transformers for semantic segmentation
Gu, Xianfan
Hu, Yingdong
Wen, Chuan
Gao, Yang
COMPUTER VISION AND IMAGE UNDERSTANDING, 2025, 251
[2] Self-supervised Vision Transformers for Writer Retrieval
Raven, Tim
Matei, Arthur
Fink, Gernot A.
DOCUMENT ANALYSIS AND RECOGNITION-ICDAR 2024, PT II, 2024, 14805 : 380 - 396
[3] Exploring Self-Supervised Vision Transformers for Gait Recognition in the Wild
Cosma, Adrian
Catruna, Andy
Radoi, Emilian
SENSORS, 2023, 23 (05)
[4] Self-Supervised Augmented Vision Transformers for Remote Physiological Measurement
Pang, Liyu
Li, Xiaoou
Wang, Zhen
Lei, Xueyi
Pei, Yulong
2023 8TH INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYTICS, ICCCBDA, 2023, : 623 - 627
[5] Learnable Masked Tokens for Improved Transferability of Self-supervised Vision Transformers
Hu, Hao
Baldassarre, Federico
Azizpour, Hossein
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT III, 2023, 13715 : 409 - 426
[6] SELF-SUPERVISED VISION TRANSFORMERS FOR JOINT SAR-OPTICAL REPRESENTATION LEARNING
Wang, Yi
Albrecht, Conrad M.
Zhu, Xiao Xiang
2022 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM (IGARSS 2022), 2022, : 139 - 142
[7] Self-supervised Vision Transformers for 3D pose estimation of novel objects
Thalhammer, Stefan
Weibel, Jean-Baptiste
Vincze, Markus
Garcia-Rodriguez, Jose
IMAGE AND VISION COMPUTING, 2023, 139
[8] A Cross-Domain Threat Screening and Localization Framework Using Vision Transformers and Self-supervised Learning
Nasim, Ammara
Akram, Muhammad Usman
Khan, Asad Mansoor
Khan, Muhammad Belal Afsar
Hassan, Taimur
2024 14TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION SYSTEMS, ICPRS, 2024,
[9] Self-Supervised Domain Adaptation for Computer Vision Tasks
Xu, Jiaolong
Xiao, Liang
Lopez, Antonio M.
IEEE ACCESS, 2019, 7 : 156694 - 156706
[10] Perceptual Hashing Using Pretrained Vision Transformers
De Geest, Jelle
De Smet, Patrick
Bonetto, Lucio
Lambert, Peter
Van Wallendael, Glenn
Mareen, Hannes
2024 IEEE GAMING, ENTERTAINMENT, AND MEDIA CONFERENCE, GEM 2024, 2024, : 19 - 24

← 1 2 3 4 5 →