Combining reinforcement learning with symbolic planning

被引：0

作者：

Grounds, Matthew ^{[1
]}

Kudenko, Daniel ^{[1
]}

机构：

[1] Univ York, Dept Comp Sci, York YO10 5DD, N Yorkshire, England

来源：

ADAPTIVE AGENTS AND MULTI-AGENT SYSTEMS | 2008年 / 4865卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

One of the major difficulties in applying Q-learning to real-world domains is the sharp increase in the number of learning steps required to converge towards an optimal policy as the size of the state space is increased. In this paper we propose a method, PLANQ-learning, that couples a Q-learner with a STRIPS planner. The planner shapes the reward function, and thus guides the Q-learner quickly to the optimal policy. We demonstrate empirically that this combination of high-level reasoning and low-level learning displays significant improvements in scaling-up behaviour as the state-space grows larger, compared to both standard Q-learning and hierarchical Q-learning methods.

引用

页码：75 / 86

页数：12

共 10 条

[1] [Anonymous], 2002, ICML
[2] Barto A., 2003, Discrete Event Dynamic Systems, V13, P341
[3] Bertsekas D., 1996, NEURO DYNAMIC PROGRA, V1st ed.
[4] Fast planning through planning graph analysis
Blum, AL
Furst, ML
[J]. ARTIFICIAL INTELLIGENCE, 1997, 90 (1-2) : 281 - 300
[5] BOUTILIER C, 1997, INT J C ART INT
[6] Hierarchical reinforcement learning with the MAXQ value function decomposition
Dietterich, TG
[J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2000, 13 : 227 - 303
[7] STRIPS - NEW APPROACH TO APPLICATION OF THEOREM PROVING TO PROBLEM SOLVING
FIKES, RE
NILSSON, NJ
[J]. ARTIFICIAL INTELLIGENCE, 1971, 2 (3-4) : 189 - 208
[8] Ghallab M., 1998, PDDL| The planning domain definition language
[9] HOFFMANN J, 2000, P 12 INT S METH INT, P216
[10] Watkins CJCH., 1989, LEARNING DELAYED REW

← 1 →