LFENav: LLM-Based Frontiers Exploration for Visual Semantic Navigation

被引：0

作者：

Shi, Yuhong ^{[1
,2
,3
]}

Liu, Jianyi ^{[1
,2
,3
]}

Zheng, Xinhu ^{[4
]}

机构：

[1] Natl Key Lab Human Machine Hybrid Augmented Intel, Xian 710049, Shaanxi, Peoples R China

[2] Natl Engn Res Ctr Visual Informat & Applicat, Xian 710049, Shaanxi, Peoples R China

[3] Xi An Jiao Tong Univ, Inst Artificial Intelligence & Robot, Xian 710049, Shaanxi, Peoples R China

[4] Hong Kong Univ Sci & Technol Guangzhou, Intelligent Transportat Thrust, Guangzhou, Peoples R China

来源：

ARTIFICIAL INTELLIGENCE APPLICATIONS AND INNOVATIONS, PT IV, AIAI 2024 | 2024年 / 714卷

基金：

中国国家自然科学基金;

关键词：

visual sematic navigation; frontier exploration; large language models;

D O I：

10.1007/978-3-031-63223-5_28

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Robot navigation in an unknown environment is a challenge task, due to the lack of spatial awareness and semantic understanding of the environment. Previous works mostly rely on learning-based approaches, which need large amount of training data and lack of generalization ability. The emergence of Large Language Models (LLMs) provides a new way for semantic understanding. This paper proposes a method of LLM-based Frontiers Exploration for visual semantic Navigation (LFENav), which leverages the rich semantic prior knowledge of LLMs to find next subgoals with the input natural language instruction. Firstly, the semantic map is incrementally constructed and the frontiers are redefine from the observed RGB-D images. A prompt mechanism is designed to embody the Chain-of-Thought (CoT) merit of LLMs. We use geometric costs to compensate the information gap of LLMs in understanding the spatial layout of scenes. Based above, a novel exploration policy is designed by integrating LLM scores and geometric costs to select better frontiers worthy of exploring. Experiments on Habitat-Matterport 3D dataset shows that the success rate of this method is up to 0.638, which is the best performance compared with the existing methods.

引用

页码：375 / 388

页数：14

共 35 条

[1]

Black S., 2022, arXiv, DOI DOI 10.48550/ARXIV.2204.06745

[2] Decentralized Multi-Agent Reinforcement Learning with Global State Prediction [J].

Bloom, Joshua ;

Paliwal, Pranjal ;

Mukherjee, Apratim ;

Pinciroli, Carlo .

2023 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2023, :8854-8861

[3]

Brooks T., 2024, Video generation models as world simulators

[4]

Chaplot DS, 2020, ADV NEUR IN, V33

[5]

Chen W, 2023, Arxiv, DOI arXiv:2209.05629

[6]

Devlin J, 2019, Arxiv, DOI [arXiv:1810.04805, DOI 10.48550/ARXIV.1810.04805]

[7] Memory-based Exploration-value Evaluation Model for Visual Navigation [J].

Feng, Yongquan ;

Xu, Liyang ;

Li, Minglong ;

Jin, Ruochun ;

Huang, Da ;

Yang, Shaowu ;

Yang, Wenjing .

2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, ICRA, 2023, :2011-2017

[8]

He KM, 2017, IEEE I CONF COMP VIS, P2980, DOI [10.1109/TPAMI.2018.2844175, 10.1109/ICCV.2017.322]

[9]

Jain K, 2022, Arxiv, DOI [arXiv:2209.11972, 10.48550/arXiv.2209.11972, DOI 10.48550/ARXIV.2209.11972]

[10]

Jiang JD, 2018, Arxiv, DOI arXiv:1806.01054

← 1 2 3 4 →