Training and Evaluation of Deep Policies Using Reinforcement Learning and Generative Models

被引:0
|
作者
Ghadirzadeh, Ali [1 ]
Poklukar, Petra [2 ]
Arndt, Karol [3 ]
Finn, Chelsea [1 ]
Kyrki, Ville [3 ]
Kragic, Danica [2 ]
Bjorkman, Marten [2 ]
机构
[1] Stanford Univ, Stanford, CA 94305 USA
[2] KTH Royal Inst Technol, Stockholm, Sweden
[3] Aalto Univ, Espoo, Finland
关键词
reinforcement learning; policy search; robot learning; deep generative models; representation learning; PRIMITIVES;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We present a data-efficient framework for solving sequential decision-making problems which exploits the combination of reinforcement learning (RL) and latent variable genera-tive models. The framework, called GenRL, trains deep policies by introducing an action latent variable such that the feed-forward policy search can be divided into two parts: (i) training a sub-policy that outputs a distribution over the action latent variable given a state of the system, and (ii) unsupervised training of a generative model that outputs a sequence of motor actions conditioned on the latent action variable. GenRL enables safe exploration and alleviates the data-inefficiency problem as it exploits prior knowledge about valid sequences of motor actions. Moreover, we provide a set of measures for evaluation of generative models such that we are able to predict the performance of the RL policy training prior to the actual training on a physical robot. We experimentally determine the characteristics of generative models that have most influence on the performance of the final policy training on two robotics tasks: shooting a hockey puck and throwing a basket-ball. Furthermore, we empirically demonstrate that GenRL is the only method which can safely and efficiently solve the robotics tasks compared to two state-of-the-art RL methods.
引用
收藏
页数:37
相关论文
共 50 条
  • [21] Generative Inverse Deep Reinforcement Learning for Online Recommendation
    Chen, Xiaocong
    Yao, Lina
    Sun, Aixin
    Wang, Xianzhi
    Xu, Xiwei
    Zhu, Liming
    PROCEEDINGS OF THE 30TH ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT, CIKM 2021, 2021, : 201 - 210
  • [22] Generative chemistry: drug discovery with deep learning generative models
    Bian, Yuemin
    Xie, Xiang-Qun
    JOURNAL OF MOLECULAR MODELING, 2021, 27 (03)
  • [23] Natural Walking With Musculoskeletal Models Using Deep Reinforcement Learning
    Weng, Jiacheng
    Hashemi, Ehsan
    Arami, Arash
    IEEE ROBOTICS AND AUTOMATION LETTERS, 2021, 6 (02) : 4156 - 4162
  • [24] RLBoost: Boosting supervised models using deep reinforcement learning
    Batanero, Eloy Anguiano
    Pascual, angela Fernandez
    Jimenez, alvaro Barbero
    NEUROCOMPUTING, 2025, 618
  • [25] Generative chemistry: drug discovery with deep learning generative models
    Yuemin Bian
    Xiang-Qun Xie
    Journal of Molecular Modeling, 2021, 27
  • [26] Deep learning, reinforcement learning, and world models
    Matsuo, Yutaka
    LeCun, Yann
    Sahani, Maneesh
    Precup, Doina
    Silver, David
    Sugiyama, Masashi
    Uchibe, Eiji
    Morimoto, Jun
    NEURAL NETWORKS, 2022, 152 : 267 - 275
  • [27] Data Augmentation for the Femoral Head Using Generative Deep Learning Models
    Won, Joon Hee
    Goh, Tae Sik
    Lee, Jung Sub
    Lim, Hee Chang
    TRANSACTIONS OF THE KOREAN SOCIETY OF MECHANICAL ENGINEERS B, 2025, 49 (02) : 109 - 119
  • [28] Learning Structured Output Representation using Deep Conditional Generative Models
    Sohn, Kihyuk
    Yan, Xinchen
    Lee, Honglak
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [29] Learning Intention-Aware Policies in Deep Reinforcement Learning
    Zhao, T.
    Wu, S.
    Li, G.
    Chen, Y.
    Niu, G.
    Sugiyama, Masashi
    NEURAL COMPUTATION, 2023, 35 (10) : 1657 - 1677
  • [30] Active Vision Control Policies for Face Recognition using Deep Reinforcement Learning
    Tosidis, Pavlos
    Passalis, Nikolaos
    Tefas, Anastasios
    2022 30TH EUROPEAN SIGNAL PROCESSING CONFERENCE (EUSIPCO 2022), 2022, : 1087 - 1091