Profit sharing using a dynamic reinforcement function considering expectation value of reinforcement

被引:1
作者
Tamashima, Daisuke [1 ]
Koakutsu, Seiichi [1 ]
Okamoto, Takashi [1 ]
Hirata, Hironori [1 ]
机构
[1] Chiba University, Inage-ku, chiba 265-8522, 1-33, yayoi-cho
关键词
Ineffective rule suppression theorem; Profit sharing; Reinforcement function; Reinforcement learning;
D O I
10.1541/ieejeiss.129.1339
中图分类号
学科分类号
摘要
Profit Sharing is one of exploitation oriented reinforcement learning methods and aims to adapt a system to a given environment. In Profit Sharing, an agent learns a policy based on the reward that is received from the environment when it reaches a goal state. It is important to design a reinforcement function that distributes the received reward to each action rule in the policy. If the reinforcement function satisfies the ineffective rule suppression theorem, the reinforcement function is able to distribute more reward to effective rules than ineffective ones, even in the worst case where an ineffective rule is infinitely selected. The value of the reinforcement function, however, decreases exponentially with distance from the goal state. As a result, the agent fails to learn an appropriate policy when the episode length from an initial state to the goal state is relatively long. In this paper, we report a new dynamic reinforcement function considering the expected value of reward which is distributed to each rule. Using our reinforcement function, the expected value of reward distributed to the effective rules becomes larger than that to the ineffective ones. Even when the episode length becomes long, a decrease in the value of the reinforcement function is able to be suppressed, and thus the agent is able to learn an appropriate policy. We apply our reinforcement function to Sutton's maze problem and show its effectiveness. © 2009 The Institute of Electrical Engineers of Japan.
引用
收藏
页码:1339 / 1347+24
相关论文
共 12 条
  • [1] Yamamura M., Miyazaki K., Kobayashi S., A survey on learning for agents, Journal of JSAI, 10, 5, pp. 23-29, (1995)
  • [2] Watkins C.J.C.H., Dayan P., Technical note: Q- Learning, Machine Learning, 8, pp. 55-68, (1992)
  • [3] Arai S., Miyazaki K., Kobayashi S., Methodology in multi-agent reinforcement learning: Approaches by Q- learning and profit sharing, Journal of JSAI, 13, 5, pp. 609-618, (1998)
  • [4] Miyazaki K., Yamamura M., Kobayashi S., A theory of profit sharing in reinforcement learning, Journal of JSAI, 9, 4, pp. 580-587, (1994)
  • [5] Uemura W., Tatsumi S., About the reinforcement function for profit sharing, Transactions of JSAI, 19, 4, pp. 197-203, (2004)
  • [6] Uemura W., Ueno A., Tatsumi S., A profit sharing method for forgetting past experiences effectively, Trans-actions of JSAI, 21, 1, pp. 81-93, (2006)
  • [7] Hasegawa Y., Takada S., Nakano H., Arai S., Miyauchi A., A reinforcement learning method using a dynamic reinforcement function based on action selection probability, The IEICE Transactions on Information and Systems, J89D, 4, pp. 788-796, (2006)
  • [8] Nakano H., Miyauchi A., Design of Reinforcement Functions in Profit Sharing Reinforcement Learning, IEICE Technical Report, 106, 574, pp. 1-6, (2007)
  • [9] Kawai H., Ueno A., Tatsumi S., The consideration of rationality of Profit Sharing with roulette action selection, The 19th Annual Conference of JSAI, 1D3-03, (2005)
  • [10] Matsui T., Ohwada H., Rationality of Profit Sharing Based on Expected Value, The 22th Annual Conference of JSAI, 3A2-1, (2008)