Learning One-Shot Imitation From Humans Without Humans

被引:45
作者
Bonardi, Alessandro [1 ]
James, Stephen [2 ]
Davison, Andrew J. [2 ]
机构
[1] Imperial Coll London, Dept Comp, London SW7 2BU, England
[2] Imperial Coll London, Dyson Robot Lab, London SW7 2BU, England
基金
英国工程与自然科学研究理事会;
关键词
Learning from demonstration; deep learning in robotics and automation; perception for grasping and manipulation; TASKS;
D O I
10.1109/LRA.2020.2977835
中图分类号
TP24 [机器人技术];
学科分类号
080202 ; 1405 ;
摘要
Humans can naturally learn to execute a new task by seeing it performed by other individuals once, and then reproduce it in a variety of configurations. Endowing robots with this ability of imitating humans from third person is a very immediate and natural way of teaching new tasks. Only recently, through meta-learning, there have been successful attempts to one-shot imitation learning from humans; however, these approaches require a lot of human resources to collect the data in the real world to train the robot. But is there a way to remove the need for real world human demonstrations during training? We show that with Task-Embedded Control Networks, we can infer control polices by embedding human demonstrations that can condition a control policy and achieve one-shot imitation learning. Importantly, we do not use a real human arm to supply demonstrations during training, but instead leverage domain randomisation in an application that has not been seen before: sim-to-real transfer on humans. Upon evaluating our approach on pushing and placing tasks in both simulation and in the real world, we show that in comparison to a system that was trained on real-world data we are able to achieve similar results by utilising only simulation data. Videos can be found here: https://sites.google.com/view/tecnets-humans.
引用
收藏
页码:3533 / 3539
页数:7
相关论文
共 45 条
[1]  
Abbeel P., 2004, Apprenticeship learning via inverse reinforcement learning. pages, P1, DOI [DOI 10.1145/1015330.1015430, 10.1145/1015330.1015430]
[2]  
Akgun B, 2012, ACMIEEE INT CONF HUM, P391
[3]  
[Anonymous], 2017, P C ROB LEARN
[4]  
[Anonymous], 2016, P INT C LEARN REPR
[5]  
[Anonymous], 2006, P 15 IEEE INT S ROBO, DOI DOI 10.1109/ROMAN.2006.314458
[6]  
[Anonymous], 2017, OPTIMIZATION MODEL F
[7]  
[Anonymous], P ROB SCI SYST
[8]  
[Anonymous], 2017, CORL
[9]  
[Anonymous], 2015, P INT C LEARN REPR I
[10]  
Ba L. J., 2016, Layer Nor- malization