SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning

被引：0

作者：

Rana, Krishan ^{[1
]}

Haviland, Jesse ^{[1
,2
]}

Garg, Sourav ^{[3
]}

Abou-Chakra, Jad ^{[1
]}

Reid, Ian ^{[3
]}

Sunderhauf, Niko ^{[1
]}

机构：

[1] Queensland Univ Technol, QUT Ctr Robot, Brisbane, Qld, Australia

[2] CSIRO Data61 Robot & Autonomous Syst Grp, Pullenvale, Australia

[3] Univ Adelaide, Adelaide, SA, Australia

来源：

CONFERENCE ON ROBOT LEARNING, VOL 229 | 2023年 / 229卷

基金：

澳大利亚研究理事会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Large language models (LLMs) have demonstrated impressive results in developing generalist planning agents for diverse tasks. However, grounding these plans in expansive, multi-floor, and multi-room environments presents a significant challenge for robotics. We introduce SayPlan, a scalable approach to LLM-based, large-scale task planning for robotics using 3D scene graph (3DSG) representations. To ensure the scalability of our approach, we: (1) exploit the hierarchical nature of 3DSGs to allow LLMs to conduct a semantic search for task-relevant subgraphs from a smaller, collapsed representation of the full graph; (2) reduce the planning horizon for the LLM by integrating a classical path planner and (3) introduce an iterative replanning pipeline that refines the initial plan using feedback from a scene graph simulator, correcting infeasible actions and avoiding planning failures. We evaluate our approach on two large-scale environments spanning up to 3 floors and 36 rooms with 140 assets and objects and show that our approach is capable of grounding large-scale, long-horizon task plans from abstract, and natural language instruction for a mobile manipulator robot to execute. We provide real robot video demonstrations on our project page sayplan.github.io.

引用

页数：50

共 52 条

[11]

Dijkstra E. W., 2022, Edsger Wybe Dijkstra: His Life, Work, and Legacy, V45, P287

[12]

Driess D., 2023, Palm-e: An embodied multimodal language model,

[13] PDDL2.1: An extension to PDDL for expressing temporal planning domains [J].

Fox, M ;

Long, D .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2003, 20 :61-124

[14]

Garg S., 2020, Found. Trends Robot, V8, P1, DOI [DOI 10.1561/2300000059, 10.1561/2300000059]

[15]

Gay P., 2018, COMPUTER VISION AC 3, P330

[16]

Gelfond M, 2014, KNOWLEDGE REPRESENTATION, REASONING, AND THE DESIGN OF INTELLIGENT AGENTS: THE ANSWER-SET PROGRAMMING APPROACH, P1, DOI 10.1017/CBO9781139342124

[17]

Ghallab M., 1998, Pddl-the planning domain definition language

[18] The Franka Emika Robot A Reference Platform for Robotics Research and Education [J].

Haddadin, Sami ;

Parusel, Sven ;

Johannsmeier, Lars ;

Golz, Saskia ;

Gabl, Simon ;

Walch, Florian ;

Sabaghian, Mohamadreza ;

Jaehne, Christoph ;

Hausperger, Lukas ;

Haddadin, Simon .

IEEE ROBOTICS & AUTOMATION MAGAZINE, 2022, 29 (02) :46-64

[19]

Hagberg AA., 2008, P PYTH SCI C SCIPY, P11

[20]

Haslum N., 2019, Synthesis Lectures on Artificial Intelligence and Machine Learning, V13, P1, DOI DOI 10.2200/S00900ED2V01Y201902AIM042

← 1 2 3 4 5 6 →