Training and Evaluation of Deep Policies Using Reinforcement Learning and Generative Models

被引:0
作者
Ghadirzadeh, Ali [1 ]
Poklukar, Petra [2 ]
Arndt, Karol [3 ]
Finn, Chelsea [1 ]
Kyrki, Ville [3 ]
Kragic, Danica [2 ]
Bjorkman, Marten [2 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] KTH Royal Inst Technol, Stockholm, Sweden
[3] Aalto Univ, Espoo, Finland
关键词
reinforcement learning; policy search; robot learning; deep generative models; representation learning; PRIMITIVES;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable genera-tive models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basket-ball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.
引用
收藏
页数:37
相关论文
共 78 条
[1]  
Abdolmaleki A, 2020, Arxiv, DOI arXiv:2005.07513
[2]  
[Anonymous], 2003, Advances in Neural Information Processing Systems (NeurIPS)
[3]  
Arndt K, 2020, IEEE INT CONF ROBOT, P2725, DOI [10.1109/ICRA40945.2020.9196540, 10.1109/icra40945.2020.9196540]
[4]  
Bahl S, 2020, ADV NEUR IN, V33
[5]   Representation Learning: A Review and New Perspectives [J].
Bengio, Yoshua ;
Courville, Aaron ;
Vincent, Pascal .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828
[6]  
Binkowski M., 2018, ARXIV180101401, P1
[7]  
Brock A, 2019, Arxiv, DOI arXiv:1809.11096
[8]  
Buesing L, 2018, Arxiv, DOI arXiv:1802.03006
[9]   Imitating by Generating: Deep Generative Models for Imitation of Interactive Tasks [J].
Butepage, Judith ;
Ghadirzadeh, Ali ;
Karadag, Ozge Oztimur ;
Bjorkman, Marten ;
Kragic, Danica .
FRONTIERS IN ROBOTICS AND AI, 2020, 7
[10]  
Chen RTQ, 2018, 32 C NEURAL INFORM P, V31