Probabilistic inference for determining options in reinforcement learning

被引:0
作者
Christian Daniel
Herke van Hoof
Jan Peters
Gerhard Neumann
机构
[1] Technische Universität Darmstadt,
[2] Bosch Corporate Research,undefined
[3] Cognitive Systems,undefined
[4] Max-Planck-Institut für Intelligente Systeme,undefined
来源
Machine Learning | 2016年 / 104卷
关键词
Reinforcement learning; Robot learning; Options; Semi Markov decision process;
D O I
暂无
中图分类号
学科分类号
摘要
Tasks that require many sequential decisions or complex solutions are hard to solve using conventional reinforcement learning algorithms. Based on the semi Markov decision process setting (SMDP) and the option framework, we propose a model which aims to alleviate these concerns. Instead of learning a single monolithic policy, the agent learns a set of simpler sub-policies as well as the initiation and termination probabilities for each of those sub-policies. While existing option learning algorithms frequently require manual specification of components such as the sub-policies, we present an algorithm which infers all relevant components of the option framework from data. Furthermore, the proposed approach is based on parametric option representations and works well in combination with current policy search methods, which are particularly well suited for continuous real-world tasks. We present results on SMDPs with discrete as well as continuous state-action spaces. The results show that the presented algorithm can combine simple sub-policies to solve complex tasks and can improve learning performance on simpler tasks.
引用
收藏
页码:337 / 357
页数:20
相关论文
共 14 条
  • [1] Baum LE(1972)An equality and associated maximization technique in statistical estimation for probabilistic functions of markov processes Inequalities 3 1-8
  • [2] Dietterich TG(2000)Hierarchical reinforcement learning with the MAXQ value function decomposition Journal of Artificial Intelligence Research (JAIR) 13 227-303
  • [3] Lagoudakis M(2003)Least-squares policy iteration Journal of Machine Learning Research 4 1107-1149
  • [4] Parr R(2001)Acquisition of stand-up behavior by a real robot using hierarchical reinforcement learning Robotics and Autonomous Systems 36 37-51
  • [5] Morimoto J(1999)Between MDPs and Semi-MDPs: A framework for temporal abstraction in reinforcement learning Artificial Intelligence 112 181-211
  • [6] Doya K(2010)A generalized path integral control approach to reinforcement learning Journal of Machine Learning Research 11 3137-3181
  • [7] Sutton RS(1992)Q-learning Machine Learning 8 279-292
  • [8] Precup D(undefined)undefined undefined undefined undefined-undefined
  • [9] Singh S(undefined)undefined undefined undefined undefined-undefined
  • [10] Theodorou E(undefined)undefined undefined undefined undefined-undefined