Model-based Learning with Bayesian and MAXQ Value Function Decomposition for Hierarchical Task

被引：2

作者：

Dai, Zhaohui ^{[1
]}

Chen, Xin ^{[1
]}

Cao, Weihua ^{[1
]}

Wu, Min ^{[1
]}

机构：

[1] Cent S Univ, Sch Informat Sci & Engn, Changsha, Hunan, Peoples R China

来源：

2010 8TH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION (WCICA) | 2010年

关键词：

Bayesian; MAXQ value function decomposition; prioritized sweeping; reinforcement learning; REINFORCEMENT; TIME;

D O I：

10.1109/WCICA.2010.5554020

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

How to improve efficiency of learning is always the key issue for implementation of reinforcement learning. This paper makes use of advantages of both hierarchical learning and model-based learning, so that an improved algorithm, named Bayesian-MAXQ learning, is introduced, in which several modifications are developed to solve the value update of hierarchy, while the possible performance damages brought by prioritized sweeping is reduced to trivial. The simulation results show that, Bayesian-MAXQ learning performs with high efficiency, and it can serve as a good framework for further study on hierarchical model-based learning.

引用

页码：676 / 681

页数：6

共 17 条

[11] Reinforcement learning: A survey [J].

Kaelbling, LP ;

Littman, ML ;

Moore, AW .

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 1996, 4 :237-285

[12] Modular fuzzy-reinforcement learning approach with internal model capabilities for multiagent systems [J].

Kaya, M ;

Alhajj, R .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (02) :1210-1223

[13] PRIORITIZED SWEEPING - REINFORCEMENT LEARNING WITH LESS DATA AND LESS TIME [J].

MOORE, AW ;

ATKESON, CG .

MACHINE LEARNING, 1993, 13 (01) :103-130

[14]

STENS M, 2000, P 17 INT C MACH LEAR, P943

[15] Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning [J].

Sutton, RS ;

Precup, D ;

Singh, S .

ARTIFICIAL INTELLIGENCE, 1999, 112 (1-2) :181-211

[16]

WATKINS CJCH, 1992, MACH LEARN, V8, P279, DOI 10.1007/BF00992698

[17]

[No title captured]

← 1 2 →