Approximate policy iteration with a policy language bias: Solving relational markov decision processes

被引:0
作者
Fern, Alan [1 ]
Yoon, Sungwook [2 ]
Givan, Robert [2 ]
机构
[1] School of Electrical Engineering and Computer Science, Oregon State University, United States
[2] School of Electrical and Computer Engineering, Purdue University, United States
来源
Journal of Artificial Intelligence Research | 1600年 / 25卷
关键词
We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions; which is often the case for the relational MDPs we are interested in. In order to apply API to such problems; we introduce a relational policy language and corresponding learner. In addition; we introduce a new bootstrapping routine for goal-based planning domains; based on random walks. Such bootstrapping is necessary for many large relational MDPs; where reward is extremely sparse; as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach; suggesting future work. © 2006 AI Access Foundation. All rights reserved;
D O I
暂无
中图分类号
学科分类号
摘要
Journal article (JA)
引用
收藏
页码:75 / 118
相关论文
empty
未找到相关数据