Bayes-adaptive hierarchical MDPs

被引:0
作者
Ngo Anh Vien
SeungGwan Lee
TaeChoong Chung
机构
[1] University of Stuttgart,Machine Learning and Robotics Laboratory
[2] Kyung Hee University,College of Liberal Arts
[3] Kyung Hee University,Department of Computer Engineering
来源
Applied Intelligence | 2016年 / 45卷
关键词
Reinforcement learning; Bayesian reinforcement learning; Hierarchical reinforcement learning; MDP; POMDP; POSMDP; Monte-Carlo tree search; Hierarchical Monte-Carlo planning;
D O I
暂无
中图分类号
学科分类号
摘要
Reinforcement learning (RL) is an area of machine learning that is concerned with how an agent learns to make decisions sequentially in order to optimize a particular performance measure. For achieving such a goal, the agent has to choose either 1) exploiting previously known knowledge that might end up at local optimality or 2) exploring to gather new knowledge that expects to improve the current performance. Among other RL algorithms, Bayesian model-based RL (BRL) is well-known to be able to trade-off between exploitation and exploration optimally via belief planning, i.e. partially observable Markov decision process (POMDP). However, solving that POMDP often suffers from curse of dimensionality and curse of history. In this paper, we make two major contributions which are: 1) an integration framework of temporal abstraction into BRL that eventually results in a hierarchical POMDP formulation, which can be solved online using a hierarchical sample-based planning solver; 2) a subgoal discovery method for hierarchical BRL that automatically discovers useful macro actions to accelerate learning. In the experiment section, we demonstrate that the proposed approach can scale up to much larger problems. On the other hand, the agent is able to discover useful subgoals for speeding up Bayesian reinforcement learning.
引用
收藏
页码:112 / 126
页数:14
相关论文
共 42 条
[1]  
Auer P(2002)Finite-time analysis of the multiarmed bandit problem Mach Learn 47 235-256
[2]  
Cesa-Bianchi N(2003)Recent advances in hierarchical reinforcement learning Discrete Event Dynamic Systems 13 341-379
[3]  
Fischer P(2000)Learning to play chess using temporal differences Mach Learn 40 243-263
[4]  
Barto AG(2014)Nonstrict hierarchical reinforcement learning for interactive systems and robots TiiS 4 15:1-15, 30
[5]  
Mahadevan S(2000)Hierarchical reinforcement learning with the MAXQ value function decomposition J Artif Intell Res (JAIR) 13 227-303
[6]  
Baxter J(2004)Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems Appl Intell 20 71-87
[7]  
Tridgell A(2009)Learning teaching strategies in an adaptive and intelligent educational system through reinforcement learning Appl Intell 31 89-106
[8]  
Weaver L(2010)Planning with noisy probabilistic relational rules J Artif Intell Res (JAIR) 39 1-49
[9]  
Cuayȧhuitl H(1959)Some studies in machine learning using the game of checkers IBM J Res Dev 3 210-229
[10]  
Kruijff-Korbayovȧ I(1999)Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning Artif Intell 112 181-211