LLM as Copilot for Coarse-Grained Vision-and-Language Navigation
被引:0
|
作者:
Qiao, Yanyuan
论文数: 0引用数: 0
h-index: 0
机构:
Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, AustraliaUniv Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
Qiao, Yanyuan
[1
]
Liu, Qianyi
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R ChinaUniv Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
Liu, Qianyi
[2
,3
]
Liu, Jiajun
论文数: 0引用数: 0
h-index: 0
机构:
CSIRO Data61, Eveleigh, Australia
Univ Queensland, Brisbane, Qld, AustraliaUniv Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
Liu, Jiajun
[4
,5
]
Liu, Jing
论文数: 0引用数: 0
h-index: 0
机构:
Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R ChinaUniv Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
Liu, Jing
[2
,3
]
Wu, Qi
论文数: 0引用数: 0
h-index: 0
机构:
Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, AustraliaUniv Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
Wu, Qi
[1
]
机构:
[1] Univ Adelaide, Australian Inst Machine Learning, Adelaide, SA, Australia
[2] Chinese Acad Sci, Inst Automat, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing, Peoples R China
Vision-and-Language;
Navigation;
Large Language;
Models;
D O I:
10.1007/978-3-031-72652-1_27
中图分类号:
TP18 [人工智能理论];
学科分类号:
081104 ;
0812 ;
0835 ;
1405 ;
摘要:
Vision-and-Language Navigation (VLN) involves guiding an agent through indoor environments using human-provided textual instructions. Coarse-grained VLN, with short and high-level instructions, has gained popularity as it closely mirrors real-world scenarios. However, a significant challenge is these instructions are often too concise for agents to comprehend and act upon. Previous studies have explored allowing agents to seek assistance during navigation, but typically offer rigid support from pre-existing datasets or simulators. The advent of Large Language Models (LLMs) presents a novel avenue for aiding VLN agents. This paper introduces VLN-Copilot, a framework enabling agents to actively seek assistance when encountering confusion, with the LLM serving as a copilot to facilitate navigation. Our approach includes the introduction of a confusion score, quantifying the level of uncertainty in an agent's action decisions, while the LLM offers real-time detailed guidance for navigation. Experimental results on two coarse-grained VLN datasets show the efficacy of our method.
机构:
Columbia Univ, Appl Phys & Appl Math, New York, NY 10027 USA
Columbia Univ, Data Sci Inst, New York, NY 10027 USAColumbia Univ, Appl Phys & Appl Math, New York, NY 10027 USA
Du, Qiang
Li, Xiantao
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Dept Math, University Pk, PA 16802 USAColumbia Univ, Appl Phys & Appl Math, New York, NY 10027 USA
Li, Xiantao
Yuan, Liming
论文数: 0引用数: 0
h-index: 0
机构:
Penn State Univ, Dept Math, University Pk, PA 16802 USAColumbia Univ, Appl Phys & Appl Math, New York, NY 10027 USA