Genetic algorithm methods for solving the best stationary policy of finite Markov decision processes

被引：4

作者：

Chen, HH ^{[1
]}

Jafari, AA ^{[1
]}

机构：

[1] New York Inst Technol, Dept Elect Engn & Comp Sci, Old Westbury, NY 11568 USA

来源：

THIRTIETH SOUTHEASTERN SYMPOSIUM ON SYSTEM THEORY (SSST) | 1998年

关键词：

D O I：

10.1109/SSST.1998.660132

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

This paper describes a heuristic approach to solving the optimal stationary policy of the standard finite Markov Decision Processes (MDP). For a MDP problem, there is a so-called policy-improvement algorithm, which can be used to determine optimal policies. It starts at an arbitrary policy fg and produces a sequence of improvements f(1), f(2), f(3), ... f(k) until an optimal policy is reached. In this paper, we propose to utilize the Genetic Algorithm method to search the best policy that can be considered as an optimal policy. The method is a three-stage cyclic process consisting of a reproduction (selection), recombination (mating), and evaluation (survival of the fittest); and lastly, to terminate the process by setting a convergent condition. The highest fitness individual presents a best policy. In conclusion, the computational advantages of using the Genetic Algorithm methods are discussed.

引用

页码：538 / 543

页数：6