LLM as Copilot for Coarse-Grained Vision-and-Language Navigation

被引:0
|
作者
Qiao, Yanyuan [1 ]
Liu, Qianyi [2 ,3 ]
Liu, Jiajun [4 ,5 ]
Liu, Jing [2 ,3 ]
Wu, Qi [1 ]
机构
[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
[4] CSIRO Data61, Eveleigh, Australia
[5] Univ Queensland, Brisbane, Qld, Australia
来源
COMPUTER VISION - ECCV 2024, PT V | 2025年 / 15063卷
关键词
Vision-and-Language; Navigation; Large Language; Models;
D O I
10.1007/978-3-031-72652-1_27
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Vision-and-Language Navigation (VLN) involves guiding an agent through indoor environments using human-provided textual instructions. Coarse-grained VLN, with short and high-level instructions, has gained popularity as it closely mirrors real-world scenarios. However, a significant challenge is these instructions are often too concise for agents to comprehend and act upon. Previous studies have explored allowing agents to seek assistance during navigation, but typically offer rigid support from pre-existing datasets or simulators. The advent of Large Language Models (LLMs) presents a novel avenue for aiding VLN agents. This paper introduces VLN-Copilot, a framework enabling agents to actively seek assistance when encountering confusion, with the LLM serving as a copilot to facilitate navigation. Our approach includes the introduction of a confusion score, quantifying the level of uncertainty in an agent's action decisions, while the LLM offers real-time detailed guidance for navigation. Experimental results on two coarse-grained VLN datasets show the efficacy of our method.
引用
收藏
页码:459 / 476
页数:18
相关论文
共 50 条
  • [1] Boosting Efficient Reinforcement Learning for Vision-and-Language Navigation With Open-Sourced LLM
    Wang, Jiawei
    Wang, Teng
    Cai, Wenzhe
    Xu, Lele
    Sun, Changyin
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2025, 10 (01): : 612 - 619
  • [2] Improved Speaker and Navigator for Vision-and-Language Navigation
    Wu, Zongkai
    Liu, Zihan
    Wang, Ting
    Wang, Donglin
    IEEE MULTIMEDIA, 2021, 28 (04) : 55 - 63
  • [3] Learning from Unlabeled 3D Environments for Vision-and-Language Navigation
    Chen, Shizhe
    Guhur, Pierre-Louis
    Tapaswi, Makarand
    Schmid, Cordelia
    Laptev, Ivan
    COMPUTER VISION, ECCV 2022, PT XXXIX, 2022, 13699 : 638 - 655
  • [4] DAP: DOMAIN-AWARE PROMPT LEARNING FOR VISION-AND-LANGUAGE NAVIGATION
    Liu, Ting
    Hu, Yue
    Wu, Wansen
    Wang, Youkai
    Xu, Kai
    Yin, Quanjun
    2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024, 2024, : 2615 - 2619
  • [5] Discovering Intrinsic Subgoals for Vision-and-Language Navigation via Hierarchical Reinforcement Learning
    Wang, Jiawei
    Wang, Teng
    Xu, Lele
    He, Zichen
    Sun, Changyin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (04) : 6516 - 6528
  • [6] Coarse-Grained Directed Simulation
    Hocky, Glen M.
    Dannenhoffer-Lafage, Thomas
    Voth, Gregory A.
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2017, 13 (09) : 4593 - 4603
  • [7] Visual Perception Generalization for Vision-and-Language Navigation via Meta-Learning
    Wang, Ting
    Wu, Zongkai
    Wang, Donglin
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (08) : 5193 - 5199
  • [8] Why are coarse-grained force fields too fast? A look at dynamics of four coarse-grained polymers
    Depa, Praveen
    Chen, Chunxia
    Maranas, Janna K.
    JOURNAL OF CHEMICAL PHYSICS, 2011, 134 (01)
  • [9] Coarse-grained atomistic simulation of dislocations
    Xiong, Liming
    Tucker, Garritt
    McDowell, David L.
    Chen, Youping
    JOURNAL OF THE MECHANICS AND PHYSICS OF SOLIDS, 2011, 59 (02) : 160 - 177
  • [10] Multiconfigurational Coarse-Grained Molecular Dynamics
    Sharp, Morris E.
    Vazquez, Francisco X.
    Wagner, Jacob W.
    Dannenhoffer-Lafage, Thomas
    Voth, Gregory A.
    JOURNAL OF CHEMICAL THEORY AND COMPUTATION, 2019, 15 (05) : 3306 - 3315