About an initial value of Q-value in profit sharing

被引:0
作者
Uemura, Wataru
Ueno, Atsushi
Tatsumi, Shoji
机构
来源
2006 SICE-ICASE INTERNATIONAL JOINT CONFERENCE, VOLS 1-13 | 2006年
关键词
reinforecement learning; profit sharing; exploration and exploitation;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
A Profit Sharing method that is one of the reinforcement learning methods distributes the reward to Q-values of rules. A Q-value of a Profit Sharing method that is used at the action selection has the received value of its rule. In this paper, we discuss an initial value of Q-value and propose the setting method for the initial value of Q-value. If the initial value is too large than the distribution value, the action selection becomes always random selection. If the initial value is too small, the action selection outputs only one action that learned at a beginner. For resolving these problems, we must set the non-problem value at each state. So we propose the Q-value setting method for the initial value of Q-value at each state. The experiment shows that this method is better than the conventional method.
引用
收藏
页码:106 / 109
页数:4
相关论文
共 8 条
  • [1] Grefenstette J. J., 1988, Machine Learning, V3, P225, DOI 10.1023/A:1022614421909
  • [2] Miyazaki K, 1998, INTELLIGENT AUTONOMOUS SYSTEMS, P250
  • [3] MIYAZAKI K, 1994, J JAPANESE SOC ARTIF, V9, P104
  • [4] Sutton R.S., 1990, P 7 INT C MACHINE LE, P216
  • [5] UEMURA W, 2004, 2 INT C SOFT COMP IN
  • [6] UEMURA W, 2004, T JAPANESE SOC ARTIF, V19, P197
  • [7] WATKINS CJCH, 1992, MACH LEARN, V8, P279, DOI 10.1007/BF00992698
  • [8] Whitehead S.D., 1990, P 7 INT C MACH LEARN, P162