Approximate policy iteration with a policy language bias: Solving relational markov decision processes

被引：0

作者：

Fern, Alan ^{[1
]}

Yoon, Sungwook ^{[2
]}

Givan, Robert ^{[2
]}

机构：

[1] School of Electrical Engineering and Computer Science, Oregon State University, United States

[2] School of Electrical and Computer Engineering, Purdue University, United States

来源：

Journal of Artificial Intelligence Research | 1600年 / 25卷

关键词：

We study an approach to policy selection for large relational Markov Decision Processes (MDPs). We consider a variant of approximate policy iteration (API) that replaces the usual value-function learning step with a learning step in policy space. This is advantageous in domains where good policies are easier to represent and learn than the corresponding value functions; which is often the case for the relational MDPs we are interested in. In order to apply API to such problems; we introduce a relational policy language and corresponding learner. In addition; we introduce a new bootstrapping routine for goal-based planning domains; based on random walks. Such bootstrapping is necessary for many large relational MDPs; where reward is extremely sparse; as API is ineffective in such domains when initialized with an uninformed policy. Our experiments show that the resulting system is able to find good policies for a number of classical planning domains and their stochastic variants by solving them as extremely large relational MDPs. The experiments also point to some limitations of our approach; suggesting future work. © 2006 AI Access Foundation. All rights reserved;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Journal article (JA)

引用

页码：75 / 118