Generation of Robot Manipulation Plans Using Generative Large Language Models

被引:0
作者
Toberg, Jan-Philipp [1 ,2 ]
Cimiano, Philipp [1 ,2 ]
机构
[1] Univ Bielefeld, Ctr Cognit Interact Technol CITEC, Bielefeld, Germany
[2] Univ Bielefeld, Joint Res Ctr Cooperat & Cognit Enabled CoAI JRC, Bielefeld, Germany
来源
2023 SEVENTH IEEE INTERNATIONAL CONFERENCE ON ROBOTIC COMPUTING, IRC 2023 | 2023年
关键词
Robot Plan Generation; Large Language Models; Action Similarity; CRAM; GPT;
D O I
10.1109/IRC59093.2023.00039
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Designing plans that allow robots to carry out actions such as grasping an object or cutting a fruit is a time-consuming activity requiring specific skills and knowledge. The recent success of Generative Large Language Models (LLMs) has opened new avenues for code generation. In order to evaluate the ability of LLMs to generate code representing manipulation plans, we carry out experiments with different LLMs in the CRAM framework. In our experimental framework, we ask an LLM such as ChatGPT or GPT-4 to generate a plan for a specific target action given the plan (called designator within CRAM) for a given reference action as an example. We evaluate the generated designators against a ground truth designator using machine translation and code generation metrics, as well as assessing whether they compile. We find that GPT-4 slightly outperforms ChatGPT, but both models achieve a solid performance above all evaluated metrics. However, only similar to 36% of the generated designators compile successfully. In addition, we assess how the chosen reference action influences the code generation quality as well as the compilation success. Unexpectedly, the action similarity negatively correlates with compilation success. With respect to the metrics, we obtain either a positive or negative correlation depending on the used model. Finally, we describe our attempt to use ChatGPT in an interactive fashion to incrementally refine the initially generated designator. On the basis of our observations we conclude that the behaviour of ChatGPT is not reliable and robust enough to support the incremental refinement of a designator.
引用
收藏
页码:190 / 197
页数:8
相关论文
共 25 条
  • [1] Achiam J., 2023, Gpt-4 technical report
  • [2] CRAM - A Cognitive Robot Abstract Machine for Everyday Manipulation in Human Environments
    Beetz, Michael
    Moesenlechner, Lorenz
    Tenorth, Moritz
    [J]. IEEE/RSJ 2010 INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS 2010), 2010, : 1012 - 1017
  • [3] Brown T, 2020, Adv Neural Inf Process Syst, V33, P1877
  • [4] Bubeck S., 2023, Sparks of artificial general intelligence: Early experiments with GPT-4 (Talk)
  • [5] CaoandC Y., 2023, AAAI 2023 SPRING S C
  • [6] Chen L., 2023, IS CHATGPTS BEHAV CH
  • [7] Dech J., 2023, PyCRAM
  • [8] Out of the BLEU: How should we assess quality of the Code Generation models?
    Evtikhiev, Mikhail
    Bogomolov, Egor
    Sokolov, Yaroslav
    Bryksin, Timofey
    [J]. JOURNAL OF SYSTEMS AND SOFTWARE, 2023, 203
  • [9] Holter O. M., 2023, 4 C LANGUAGE DATA KN
  • [10] Huang WL, 2022, PR MACH LEARN RES