A rationally oriented forgettable profit sharing

被引：1

作者：

Koujaku, Sadamori ^{[1
]}

Watanabe, Kota ^{[1
]}

Igarashi, Hajima ^{[1
]}

机构：

[1] Hokkaido Univ, Sapporo, Hokkaido 060, Japan

来源：

ELECTRONICS AND COMMUNICATIONS IN JAPAN | 2013年 / 96卷 / 07期

关键词：

reinforcement learning; profit sharing; Miyazaki rational theorem;

D O I：

10.1002/ecj.11461

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, the Rationally Oriented Forgettable Profit Sharing method (RFPS) for reinforcement learning is proposed. Although profit sharing (PS) provides good performances in real environments, its learning is often slow in long-term tasks because it is difficult to determine the appropriate discount rate which satisfies the Miyazaki rational theorem. There are several rationality-relaxed PS methods which work well for such tasks. However, these PS methods may result in many irrational loops. The proposed method fulfills rationality by forgetting the reinforced irrational loops. This method can be easily combined with ordinary PS methods and performs well in long-term tasks. Simulation results show that the proposed method can learn more efficiently than conventional PS methods. (c) 2013 Wiley Periodicals, Inc. Electron Comm Jpn, 96(7): 11-18, 2013; Published online in Wiley Online Library (wileyonlinelibrary.com). DOI 10.1002/ecj.11461

引用

页码：11 / 18

页数：8

共 13 条

[1]

[Anonymous], 1989, LEARNING DELAYED REW

[2]

Grefenstette J. J., 1988, Machine Learning, V3, P225, DOI 10.1023/A:1022614421909

[3]

Kato S, 2001, IEICE T D, V84, P1067

[4]

Kawai H, 2005, 19 ANN C JSAI, p1D3

[5]

Miyazaki K., 1999, Journal of Japanese Society for Artificial Intelligence, V14, P148

[6]

Miyazaki K., 1994, Journal of Japanese Society for Artificial Intelligence, V9, P580

[7]

Miyazaki K, 1999, J JSAI, V14

[8]

Nakano H, 2007, TECH REP IEICE, V106, P1

[9]

Rummery GA, 1994, 166 CUEDFINFENGTR

[10]

Satinder SP, 1996, MACH LEARN, V22, P123

← 1 2 →