VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation

被引：6

作者：

Yokoyama, Naoki ^{[1
,2
]}

Ha, Sehoon ^{[2
]}

Batra, Dhruv ^{[2
]}

Wang, Jiuguang ^{[1
]}

Bucher, Bernadette ^{[1
]}

机构：

[1] Boston Dynam AI Inst, Boston, MA USA

[2] Georgia Inst Technol, Atlanta, GA 30332 USA

来源：

2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024 | 2024年

关键词：

D O I：

10.1109/ICRA57147.2024.10610712

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Understanding how humans leverage semantic knowledge to navigate unfamiliar environments and decide where to explore next is pivotal for developing robots capable of human-like search behaviors. We introduce a zero-shot navigation approach, Vision-Language Frontier Maps (VLFM), which is inspired by human reasoning and designed to navigate towards unseen semantic objects in novel environments. VLFM builds occupancy maps from depth observations to identify frontiers, and leverages RGB observations and a pre-trained vision-language model to generate a language-grounded value map. VLFM then uses this map to identify the most promising frontier to explore for finding an instance of a given target object category. We evaluate VLFM in photo-realistic environments from the Gibson, Habitat-Matterport 3D (HM3D), and Matterport 3D (MP3D) datasets within the Habitat simulator. Remarkably, VLFM achieves state-of-the-art results on all three datasets as measured by success weighted by path length (SPL) for the Object Goal Navigation task. Furthermore, we show that VLFM's zero-shot nature enables it to be readily deployed on real-world robots such as the Boston Dynamics Spot mobile manipulation platform. We deploy VLFM on Spot and demonstrate its capability to efficiently navigate to target objects within an office building in the real world, without any prior knowledge of the environment. The accomplishments of VLFM underscore the promising potential of vision-language models in advancing the field of semantic navigation. Videos of real world deployment can be viewed at naoki.io/vlfm.

引用

页码：42 / 48

页数：7

共 50 条

[41] Adversarial Zero-Shot Learning with Semantic Augmentation
Tong, Bin
Klinkigt, Martin
Chen, Junwen
Cui, Xiankun
Kong, Quan
Murakami, Tomokazu
Kobayashi, Yoshiyuki
THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 2476 - 2483
[42] Learning semantic ambiguities for zero-shot learning
Celina Hanouti
Hervé Le Borgne
Multimedia Tools and Applications, 2023, 82 : 40745 - 40759
[43] GroundVLP: Harnessing Zero-Shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
Shen, Haozhan
Zhao, Tiancheng
Zhu, Mingwei
Yin, Jianwei
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4766 - 4775
[44] Preserving Semantic Relations for Zero-Shot Learning
Annadani, Yashas
Biswas, Soma
2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7603 - 7612
[45] Semantic softmax loss for zero-shot learning
Ji, Zhong
Sun, Yuxin
Yu, Yunlong
Guo, Jichang
Pang, Yanwei
NEUROCOMPUTING, 2018, 316 : 369 - 375
[46] Recursive Training for Zero-Shot Semantic Segmentation
Wang, Ce
Farazi, Moshiur
Barnes, Nick
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[47] On The Ingredients of an Effective Zero-shot Semantic Parser
Yin, Pengcheng
Wieting, John
Sil, Avirup
Neubig, Graham
PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1455 - 1474
[48] Zero-Shot Detection of Buildings in Mobile LiDAR using Language Vision Model
Goo, June Moh
Zeng, Zichao
Boehm, Jan
MID-TERM SYMPOSIUM THE ROLE OF PHOTOGRAMMETRY FOR A SUSTAINABLE WORLD, VOL. 48-2, 2024, : 107 - 113
[49] Zero-Shot Learning for Computer Vision Applications
Sarma, Sandipan
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9360 - 9364
[50] Vision-Language Navigation Policy Learning and Adaptation
Wang, Xin
Huang, Qiuyuan
Celikyilmaz, Asli
Gao, Jianfeng
Shen, Dinghan
Wang, Yuan-Fang
Wang, William Yang
Zhang, Lei
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) : 4205 - 4216

← 1 2 3 4 5 →