Grounding Language with Visual Affordances over Unstructured Data

被引:26
作者
Mees, Oier [1 ]
Borja-Diaz, Jessica [1 ]
Burgard, Wolfram [2 ]
机构
[1] Univ Freiburg, Freiburg, Germany
[2] Univ Technol Nuremberg, Nurnberg, Germany
来源
2023 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2023) | 2023年
关键词
D O I
10.1109/ICRA48891.2023.10160396
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Recent works have shown that Large Language Models (LLMs) can be applied to ground natural language to a wide variety of robot skills. However, in practice, learning multi-task, language-conditioned robotic skills typically requires large-scale data collection and frequent human intervention to reset the environment or help correcting the current policies. In this work, we propose a novel approach to efficiently learn general-purpose language-conditioned robot skills from unstructured, offline and reset-free data in the real world by exploiting a self-supervised visuo-lingual affordance model, which requires annotating as little as 1% of the total data with language. We evaluate our method in extensive experiments both in simulated and real-world robotic tasks, achieving state-of-the-art performance on the challenging CALVIN benchmark and learning over 25 distinct visuomotor manipulation tasks with a single policy in the real world. We find that when paired with LLMs to break down abstract natural language instructions into subgoals via few-shot prompting, our method is capable of completing long-horizon, multi-tier tasks in the real world, while requiring an order of magnitude less data than previous approaches. Code and videos are available at http://hulc2.cs.uni-freiburg.de.
引用
收藏
页码:11576 / 11582
页数:7
相关论文
共 39 条
  • [1] Abramson J., 2021, ARXIV211203763
  • [2] Ahn Michael, 2022, ARXIV220401691
  • [3] Bisk Y, 2020, PROCEEDINGS OF THE 2020 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP), P8718
  • [4] Blukis V., 2021, C ROBOT LEARNING, P1829
  • [5] Borja-Diaz J., 2022, P IEEE INT C ROB AUT
  • [6] Chen Mark., 2021, arXiv preprint arXiv:2107.03374
  • [7] Hatori J., 2018, ICRA
  • [8] Huang W., 2022, arXiv preprint arXiv:2207.05608
  • [9] Jang E, 2021, PR MACH LEARN RES, V164, P991
  • [10] KAELBLING LP, 1993, IJCAI-93, VOLS 1 AND 2, P1094