Model-based Learning with Bayesian and MAXQ Value Function Decomposition for Hierarchical Task

被引:2
作者
Dai, Zhaohui [1 ]
Chen, Xin [1 ]
Cao, Weihua [1 ]
Wu, Min [1 ]
机构
[1] Cent S Univ, Sch Informat Sci & Engn, Changsha, Hunan, Peoples R China
来源
2010 8TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA) | 2010年
关键词
Bayesian; MAXQ value function decomposition; prioritized sweeping; reinforcement learning; REINFORCEMENT; TIME;
D O I
10.1109/WCICA.2010.5554020
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
How to improve efficiency of learning is always the key issue for implementation of reinforcement learning. This paper makes use of advantages of both hierarchical learning and model-based learning, so that an improved algorithm, named Bayesian-MAXQ learning, is introduced, in which several modifications are developed to solve the value update of hierarchy, while the possible performance damages brought by prioritized sweeping is reduced to trivial. The simulation results show that, Bayesian-MAXQ learning performs with high efficiency, and it can serve as a good framework for further study on hierarchical model-based learning.
引用
收藏
页码:676 / 681
页数:6
相关论文
共 17 条
[11]   Reinforcement learning: A survey [J].
Kaelbling, LP ;
Littman, ML ;
Moore, AW .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :237-285
[12]   Modular fuzzy-reinforcement learning approach with internal model capabilities for multiagent systems [J].
Kaya, M ;
Alhajj, R .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (02) :1210-1223
[13]   PRIORITIZED SWEEPING - REINFORCEMENT LEARNING WITH LESS DATA AND LESS TIME [J].
MOORE, AW ;
ATKESON, CG .
MACHINE LEARNING, 1993, 13 (01) :103-130
[14]  
STENS M, 2000, P 17 INT C MACH LEAR, P943
[15]   Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning [J].
Sutton, RS ;
Precup, D ;
Singh, S .
ARTIFICIAL INTELLIGENCE, 1999, 112 (1-2) :181-211
[16]  
WATKINS CJCH, 1992, MACH LEARN, V8, P279, DOI 10.1007/BF00992698
[17]  
[No title captured]