Navigation with Large Language Models: Semantic Guesswork as a Heuristic for Planning

被引:0
作者
Shah, Dhruv [1 ]
Equi, Michael [1 ]
Osinski, Blazej [3 ]
Xia, Fei [2 ]
Ichter, Brian [2 ]
Levine, Sergey [1 ,2 ]
机构
[1] Univ Calif Berkeley, Berkeley, CA 94720 USA
[2] Google DeepMind, London, England
[3] Univ Warsaw, Warsaw, Poland
来源
CONFERENCE ON ROBOT LEARNING, VOL 229 | 2023年 / 229卷
关键词
navigation; language models; planning; semantic scene understanding; VISION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Navigation in unfamiliar environments presents a major challenge for robots: while mapping and planning techniques can be used to build up a representation of the world, quickly discovering a path to a desired goal in unfamiliar settings with such methods often requires lengthy mapping and exploration. Humans can rapidly navigate new environments, particularly indoor environments that are laid out logically, by leveraging semantics-e.g., a kitchen often adjoins a living room, an exit sign indicates the way out, and so forth. Language models can provide robots with such knowledge, but directly using language models to instruct a robot how to reach some destination can also be impractical: while language models might produce a narrative about how to reach some goal, because they are not grounded in real-world observations, this narrative might be arbitrarily wrong. Therefore, in this paper we study how the "semantic guesswork" produced by language models can be utilized as a guiding heuristic for planning algorithms. Our method, Language Frontier Guide (LFG), uses the language model to bias exploration of novel real-world environments by incorporating the semantic knowledge stored in language models as a search heuristic for planning with either topological or metric maps. We evaluate LFG in challenging real-world environments and simulated benchmarks, outperforming uninformed exploration and other ways of using language models.
引用
收藏
页数:17
相关论文
共 42 条
  • [1] Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
    Anderson, Peter
    Wu, Qi
    Teney, Damien
    Bruce, Jake
    Johnson, Mark
    Sunderhauf, Niko
    Reid, Ian
    Gould, Stephen
    van den Hengel, Anton
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 3674 - 3683
  • [2] Chaplot Devendra Singh, 2020, P 16 EUR C COMP VIS, P309
  • [3] Chen B., 2022, arXiv
  • [4] Chen X., 2022, ARXIV
  • [5] Vision for mobile robot navigation: A survey
    DeSouza, GN
    Kak, AC
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2002, 24 (02) : 237 - 267
  • [6] Task and Motion Planning with Large Language Models for Object Rearrangement
    Ding, Yan
    Zhang, Xiaohan
    Paxton, Chris
    Zhang, Shiqi
    [J]. 2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, IROS, 2023, : 2086 - 2092
  • [7] Dorbala Vishnu Sashank, 2023, Can an embodied agent find your "cat-shaped mug"? llm-guided exploration for zero-shot object navigation
  • [8] Driess D., 2023, ARXIV
  • [9] Navigating to objects in the real world
    Gervet, Theophile
    Chintala, Soumith
    Batra, Dhruv
    Malik, Jitendra
    Chaplot, Devendra Singh
    [J]. SCIENCE ROBOTICS, 2023, 8 (79)
  • [10] Hirose N., 2019, IEEE ROBOTICS AUTOMA