Training and Evaluation of Deep Policies Using Reinforcement Learning and Generative Models

被引：0

作者：

Ghadirzadeh, Ali ^{[1
]}

Poklukar, Petra ^{[2
]}

Arndt, Karol ^{[3
]}

Finn, Chelsea ^{[1
]}

Kyrki, Ville ^{[3
]}

Kragic, Danica ^{[2
]}

Bjorkman, Marten ^{[2
]}

机构：

[1] Stanford Univ, Stanford, CA 94305 USA

[2] KTH Royal Inst Technol, Stockholm, Sweden

[3] Aalto Univ, Espoo, Finland

来源：

JOURNAL OF MACHINE LEARNING RESEARCH | 2022年 / 23卷

关键词：

reinforcement learning; policy search; robot learning; deep generative models; representation learning; PRIMITIVES;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable genera-tive models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basket-ball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.

引用

页数：37

共 78 条

[1]

Abdolmaleki A, 2020, Arxiv, DOI arXiv:2005.07513

[2]

[Anonymous], 2003, Advances in Neural Information Processing Systems (NeurIPS)

[3]

Arndt K, 2020, IEEE INT CONF ROBOT, P2725, DOI [10.1109/ICRA40945.2020.9196540, 10.1109/icra40945.2020.9196540]

[4]

Bahl S, 2020, ADV NEUR IN, V33

[5] Representation Learning: A Review and New Perspectives [J].

Bengio, Yoshua ;

Courville, Aaron ;

Vincent, Pascal .

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2013, 35 (08) :1798-1828

[6]

Binkowski M., 2018, ARXIV180101401, P1

[7]

Brock A, 2019, Arxiv, DOI arXiv:1809.11096

[8]

Buesing L, 2018, Arxiv, DOI arXiv:1802.03006

[9] Imitating by Generating: Deep Generative Models for Imitation of Interactive Tasks [J].

Butepage, Judith ;

Ghadirzadeh, Ali ;

Karadag, Ozge Oztimur ;

Bjorkman, Marten ;

Kragic, Danica .

FRONTIERS IN ROBOTICS AND AI, 2020, 7

[10]

Chen RTQ, 2018, 32 C NEURAL INFORM P, V31

← 1 2 3 4 5 6 7 8 →