Model-based Learning with Bayesian and MAXQ Value Function Decomposition for Hierarchical Task

被引:2
作者
Dai, Zhaohui [1 ]
Chen, Xin [1 ]
Cao, Weihua [1 ]
Wu, Min [1 ]
机构
[1] Cent S Univ, Sch Informat Sci & Engn, Changsha, Hunan, Peoples R China
来源
2010 8TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA) | 2010年
关键词
Bayesian; MAXQ value function decomposition; prioritized sweeping; reinforcement learning; REINFORCEMENT; TIME;
D O I
10.1109/WCICA.2010.5554020
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
How to improve efficiency of learning is always the key issue for implementation of reinforcement learning. This paper makes use of advantages of both hierarchical learning and model-based learning, so that an improved algorithm, named Bayesian-MAXQ learning, is introduced, in which several modifications are developed to solve the value update of hierarchy, while the possible performance damages brought by prioritized sweeping is reduced to trivial. The simulation results show that, Bayesian-MAXQ learning performs with high efficiency, and it can serve as a good framework for further study on hierarchical model-based learning.
引用
收藏
页码:676 / 681
页数:6
相关论文
共 17 条
[1]  
[Anonymous], THESIS U CALIFORNIA
[2]  
[Anonymous], 2002, ICML
[3]   LEARNING TO ACT USING REAL-TIME DYNAMIC-PROGRAMMING [J].
BARTO, AG ;
BRADTKE, SJ ;
SINGH, SP .
ARTIFICIAL INTELLIGENCE, 1995, 72 (1-2) :81-138
[4]   NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].
BARTO, AG ;
SUTTON, RS ;
ANDERSON, CW .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846
[5]  
BEOM HR, 1995, IEEE T SYST MAN CYB, V25, P464, DOI 10.1109/21.364859
[6]   Cooperative, hybrid agent architecture for real-time traffic signal control [J].
Choy, MC ;
Srinivasan, D ;
Cheu, RL .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2003, 33 (05) :597-607
[7]  
Dearden R, 1998, FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, P761
[8]   Hierarchical reinforcement learning with the MAXQ value function decomposition [J].
Dietterich, TG .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2000, 13 :227-303
[9]   Reinforcement learning in strategy selection for a coordinated multirobot system [J].
Hwang, Kao-Shing ;
Chen, Yu-Jen ;
Lee, Ching-Huang .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART A-SYSTEMS AND HUMANS, 2007, 37 (06) :1151-1157
[10]  
JONG NK, 2008, 25 INT C MACH LEARN, P432