Reinforcement learning based on local state feature learning and policy adjustment

被引:12
作者
Lin, YP [1 ]
Li, XY [1 ]
机构
[1] Hunan Univ, Coll Comp & Commun, Changsha 410082, Peoples R China
关键词
reinforcement learning; agent; Markov decision processes; temporal-difference learning; local state feature;
D O I
10.1016/S0020-0255(03)00006-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The extension of reinforcement learning (RL) to large state space has inevitably encountered the problem of the curse of dimensionality. Improving the learning efficiency of the agent is much more important to the practical application of RL. Consider learning to optimally solve Markov decision problems in a particular domain, if the domain has particular characteristics that are attributable to each state, the agent might be able to take advantage of these features to direct the future learning. This paper firstly defines the local state feature, then a state feature function is used to generate the local state features of a state. Also a weight function is introduced to adjust current policy to the actions worth exploring. Based on the above, an improved SARSA algorithm, Feature-SARSA, is proposed. We validate our new algorithm by experiment on a complex domain, named Sokoban. The results show that the new algorithm has better performance. (C) 2003 Elsevier Science Inc. All rights reserved.
引用
收藏
页码:59 / 70
页数:12
相关论文
共 22 条
[11]  
PARR R, 1998, ADV NEURAL INFORMATI, V10
[12]  
PRECUP D, 1997, 1997 AAAI FALL S MOD
[13]  
RUMMERY GA, 1994, 16L CUEDFINFENGTR
[14]  
SINGER B, 1999, CMUCS99122
[15]  
Sutton R. S., 1988, Machine Learning, V3, P9, DOI 10.1023/A:1022633531479
[16]  
Sutton R.S., 1984, Temporal Credit Assignment in Reinforcement Learning
[17]  
Sutton R. S., 1998, Reinforcement Learning: An Introduction, V22447
[18]  
Sutton RS, 1996, ADV NEUR IN, V8, P1038
[19]  
Sutton-Tyrrell K, 1999, CIRCULATION, V99, P1105
[20]  
TADEPALLI P, 1997, P INT C MACH LEARN S, P358