VLFM: Vision-Language Frontier Maps for Zero-Shot Semantic Navigation

被引:6
作者
Yokoyama, Naoki [1 ,2 ]
Ha, Sehoon [2 ]
Batra, Dhruv [2 ]
Wang, Jiuguang [1 ]
Bucher, Bernadette [1 ]
机构
[1] Boston Dynam AI Inst, Boston, MA USA
[2] Georgia Inst Technol, Atlanta, GA 30332 USA
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA 2024 | 2024年
关键词
D O I
10.1109/ICRA57147.2024.10610712
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Understanding how humans leverage semantic knowledge to navigate unfamiliar environments and decide where to explore next is pivotal for developing robots capable of human-like search behaviors. We introduce a zero-shot navigation approach, Vision-Language Frontier Maps (VLFM), which is inspired by human reasoning and designed to navigate towards unseen semantic objects in novel environments. VLFM builds occupancy maps from depth observations to identify frontiers, and leverages RGB observations and a pre-trained vision-language model to generate a language-grounded value map. VLFM then uses this map to identify the most promising frontier to explore for finding an instance of a given target object category. We evaluate VLFM in photo-realistic environments from the Gibson, Habitat-Matterport 3D (HM3D), and Matterport 3D (MP3D) datasets within the Habitat simulator. Remarkably, VLFM achieves state-of-the-art results on all three datasets as measured by success weighted by path length (SPL) for the Object Goal Navigation task. Furthermore, we show that VLFM's zero-shot nature enables it to be readily deployed on real-world robots such as the Boston Dynamics Spot mobile manipulation platform. We deploy VLFM on Spot and demonstrate its capability to efficiently navigate to target objects within an office building in the real world, without any prior knowledge of the environment. The accomplishments of VLFM underscore the promising potential of vision-language models in advancing the field of semantic navigation. Videos of real world deployment can be viewed at naoki.io/vlfm.
引用
收藏
页码:42 / 48
页数:7
相关论文
共 50 条
  • [41] Adversarial Zero-Shot Learning with Semantic Augmentation
    Tong, Bin
    Klinkigt, Martin
    Chen, Junwen
    Cui, Xiankun
    Kong, Quan
    Murakami, Tomokazu
    Kobayashi, Yoshiyuki
    THIRTY-SECOND AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTIETH INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / EIGHTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2018, : 2476 - 2483
  • [42] Learning semantic ambiguities for zero-shot learning
    Celina Hanouti
    Hervé Le Borgne
    Multimedia Tools and Applications, 2023, 82 : 40745 - 40759
  • [43] GroundVLP: Harnessing Zero-Shot Visual Grounding from Vision-Language Pre-training and Open-Vocabulary Object Detection
    Shen, Haozhan
    Zhao, Tiancheng
    Zhu, Mingwei
    Yin, Jianwei
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 5, 2024, : 4766 - 4775
  • [44] Preserving Semantic Relations for Zero-Shot Learning
    Annadani, Yashas
    Biswas, Soma
    2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 7603 - 7612
  • [45] Semantic softmax loss for zero-shot learning
    Ji, Zhong
    Sun, Yuxin
    Yu, Yunlong
    Guo, Jichang
    Pang, Yanwei
    NEUROCOMPUTING, 2018, 316 : 369 - 375
  • [46] Recursive Training for Zero-Shot Semantic Segmentation
    Wang, Ce
    Farazi, Moshiur
    Barnes, Nick
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [47] On The Ingredients of an Effective Zero-shot Semantic Parser
    Yin, Pengcheng
    Wieting, John
    Sil, Avirup
    Neubig, Graham
    PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), 2022, : 1455 - 1474
  • [48] Zero-Shot Detection of Buildings in Mobile LiDAR using Language Vision Model
    Goo, June Moh
    Zeng, Zichao
    Boehm, Jan
    MID-TERM SYMPOSIUM THE ROLE OF PHOTOGRAMMETRY FOR A SUSTAINABLE WORLD, VOL. 48-2, 2024, : 107 - 113
  • [49] Zero-Shot Learning for Computer Vision Applications
    Sarma, Sandipan
    PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON MULTIMEDIA, MM 2023, 2023, : 9360 - 9364
  • [50] Vision-Language Navigation Policy Learning and Adaptation
    Wang, Xin
    Huang, Qiuyuan
    Celikyilmaz, Asli
    Gao, Jianfeng
    Shen, Dinghan
    Wang, Yuan-Fang
    Wang, William Yang
    Zhang, Lei
    IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (12) : 4205 - 4216