Imitation from Observation using RL and Graph-based Representation of Demonstrations

被引:0
作者
El Manyari, Yassine [1 ,2 ]
Le Callet, Patrick [2 ]
Dolle, Laurent [1 ]
机构
[1] CEA Tech Pays de la Loire, F-44340 Bouguenais, France
[2] Univ Nantes, Ecole Cent Nantes, CNRS, UMR 6004,LS2N, F-44000 Nantes, France
来源
2022 21ST IEEE INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND APPLICATIONS, ICMLA | 2022年
关键词
D O I
10.1109/ICMLA55696.2022.00202
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Teaching robots behavioral skills by leveraging examples provided by an expert, also referred to as Imitation Learning from Observation (IfO or ILO), is a promising approach for learning novel tasks without requiring a task-specific reward function to be engineered. We propose a RL-based framework to teach robots manipulation tasks given expert observation-only demonstrations. First, a representation model is trained to extract spatial and temporal features from demonstrations. Graph Neural Networks (GNNs) are used to encode spatial patterns, while LSTMs and Transformers are used to encode temporal features. Second, based on an off-the-shelf RL algorithm, the demonstrations are leveraged through the trained representation to guide the policy training towards solving the task demonstrated by the expert. We show that our approach compares favorably to state-of-the-art IfO algorithms with a 99% success rate and transfers well to the real world.
引用
收藏
页码:1258 / 1265
页数:8
相关论文
共 31 条
[1]  
Advanced Realtime Tracking GmbH & Co. KG, ART ADV REALT TRACK
[2]  
[Anonymous], 2019, NIPS
[3]  
Arora S, 2020, Arxiv, DOI arXiv:1806.06877
[4]  
Arras L, 2019, Arxiv, DOI arXiv:1909.12114
[5]   Learning Object-Action Relations from Bimanual Human Demonstration Using Graph Networks [J].
Dreher, Christian R. G. ;
Waechter, Mirko ;
Asfour, Tamim .
IEEE ROBOTICS AND AUTOMATION LETTERS, 2020, 5 (01) :187-194
[6]  
Fey M, 2019, Arxiv, DOI [arXiv:1903.02428, DOI 10.48550/ARXIV.1903.02428]
[7]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672
[8]  
Haarnoja T, 2018, Arxiv, DOI arXiv:1801.01290
[9]  
Hamilton WL, 2017, ADV NEUR IN, V30
[10]  
Hester T, 2018, AAAI CONF ARTIF INTE, P3223