ChatGPT for Robotics: Design Principles and Model Abilities

被引:141
作者
Vemprala, Sai H. [1 ]
Bonatti, Rogerio [2 ]
Bucker, Arthur [3 ]
Kapoor, Ashish [1 ]
机构
[1] Scaled Fdn, Kirkland, WA 98033 USA
[2] Microsoft Corp, Redmond, WA 98072 USA
[3] Carnegie Mellon Univ, Robot Inst, Pittsburgh, PA 15213 USA
关键词
Large language models; Open systems; Artificial intelligence; robotics; language understanding; code generation; perception;
D O I
10.1109/ACCESS.2024.3387941
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper presents an experimental study regarding the use of OpenAI's ChatGPT for robotics applications. We outline a strategy that combines design principles for prompt engineering and the creation of a high-level function library which allows ChatGPT to adapt to different robotics tasks, simulators, and form factors. We focus our evaluations on the effectiveness of different prompt engineering techniques and dialog strategies towards the execution of various types of robotics tasks. We explore ChatGPT's ability to use free-form dialog, parse XML tags, and to synthesize code, in addition to the use of task-specific prompting functions and closed-loop reasoning through dialogues. Our study encompasses a range of tasks within the robotics domain, from basic logical, geometrical, and mathematical reasoning all the way to complex domains such as aerial navigation, manipulation, and embodied agents. We show that ChatGPT can be effective at solving several of such tasks, while allowing users to interact with it primarily via natural language instructions. In addition to these studies, we introduce an open-sourced research tool called PromptCraft, which contains a platform where researchers can collaboratively upload and vote on examples of good prompting schemes for robotics applications, as well as a sample robotics simulator with ChatGPT integration, making it easier for users to get started with using ChatGPT for robotics. Videos and blog: aka.ms/ChatGPT-Robotics PromptCraft, AirSim-ChatGPT code: https://github.com/microsoft/PromptCraft-Robotics
引用
收藏
页码:55682 / 55696
页数:15
相关论文
共 44 条
[1]  
Ahn M, 2022, Arxiv, DOI [arXiv:2204.01691, DOI 10.48550/ARXIV.2204.01691]
[2]   Multimodal estimation and communication of latent semantic knowledge for robust execution of robot instructions [J].
Arkin, Jacob ;
Park, Daehyung ;
Roy, Subhro ;
Walter, Matthew R. ;
Roy, Nicholas ;
Howard, Thomas M. ;
Paul, Rohan .
INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (10-11) :1279-1304
[3]  
Bonatti R, 2022, Arxiv, DOI arXiv:2209.11133
[4]  
Brohan A, 2023, Arxiv, DOI [arXiv:2212.06817, 10.48550/arXiv.2212.06817]
[5]  
Brown TB, 2020, ADV NEUR IN, V33
[6]  
Bucker A., 2022, arXiv
[7]  
Bucker A, 2022, Arxiv, DOI arXiv:2208.02918
[8]  
Chen L., 2021, P ADV NEUR INF PROC, V34, P1
[9]  
Chen M., 2021, Evaluating large language models trained on code, DOI DOI 10.48550/ARXIV.2107.03374
[10]  
Dettmers T., 2023, P ADV NEUR INF PROC, P1233