About Q-values of Monte Carlo method

被引：0

作者：

Uemura, Wataru ^{[1
]}

机构：

[1] Ryukoku Univ, Dept Elect & Informat, Kyoto, Japan

来源：

2008 PROCEEDINGS OF SICE ANNUAL CONFERENCE, VOLS 1-7 | 2008年

关键词：

Reinforecement Learning; Profit Sharing method; Monte Carlo method;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Profit Sharing method is one of the reinforcement learning methods. Profit Sharing can work well on the Partially Observable Markov Decision Processes(POMDPs). Because it is the typical non-bootstrap method, and it's Q-value is usually handled accumulative. Profit Sharing, however, does not work well on the probabilistic state transition. This paper we propose the novel learning method which can work well on the probabilistic state transition. It is similar to the Monte Carlo method. So we discuss about Q-values of our proposed method. In the environment with deterministic state transitions, we show the same performance both the conventional Profit Sharing and proposed method. And show the good performance of proposed method against the conventional Profit Sharing.

引用

页码：1953 / 1956

页数：4

共 10 条

[1]

Grefenstette J. J., 1988, Machine Learning, V3, P225, DOI 10.1023/A:1022614421909

[2]

Miyazaki K, 1998, INTELLIGENT AUTONOMOUS SYSTEMS, P250

[3]

MIYAZAKI K, 1994, J JAPANESE SOC ARTIF, V9, P104

[4]

Miyazaki K., 1994, 3 INT C FUZZY LOGIC, P285

[5]

Sutton R.S.., 1990, Machine learning proceedings 1990, P216, DOI 10.1016/B978-1-55860-141-3.50030-4

[6]

Sutton R. S., 1998, Reinforcement Learning: An Introduction, V22447

[7]

Uemura Wataru, 2007, SICE '07. 46th SICE Annual Conference, P2762

[8]

UEMURA W, 2004, JOINT 2 INT C SOFT C

[9]

WATKINS CJCH, 1992, MACH LEARN, V8, P279, DOI 10.1007/BF00992698

[10]

Whitehead S.D., 1990, P 7 INT C MACH LEARN, P162

← 1 →