Reinforcement learning with internal expectation for the random neural network

被引:35
作者
Halici, U [1 ]
机构
[1] Middle E Tech Univ, Dept Elect & Elect Engn, Comp Vis & Artificial Neural Networks Res Lab, TR-06531 Ankara, Turkey
关键词
random neural networks; reinforcement learning; punishment; extinction; expectation;
D O I
10.1016/S0377-2217(99)00479-8
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
The reinforcement learning scheme proposed in Halici (1977) (Halici, U., 1997. Journal of Biosystems 40 (1/2), 83-91) for the random neural network (Gelenbe, E., 1989b. Neural Computation 1 (4), 502-510) is based on reward and performs well for stationary environments. However: when the environment is not stationary it suffers from getting stuck to the previously learned action and extinction is not possible. In this paper, the reinforcement learning scheme is extended by introducing a weight update rule which takes into consideration the internal expectation of reinforcement. With the proposed scheme, the system behaves as in learning with reward when the reward for the learned action is not below the internal expectation, otherwise it behaves as in learning with punishment so that other possibilities can be explored. Such a scheme has made extinction possible while resulting in a good convergence to the most rewarding action. (C) 2000 Elsevier Science B.V. All rights reserved.
引用
收藏
页码:288 / 307
页数:20
相关论文
共 52 条
  • [1] AGUILAR J, 1997, P ICNN IEEE 97 HOUST, P1023
  • [2] Atalay V., 1992, International Journal of Pattern Recognition and Artificial Intelligence, V6, P437, DOI 10.1142/S0218001492000266
  • [3] Atalay V., 1992, International Journal of Pattern Recognition and Artificial Intelligence, V6, P131, DOI 10.1142/S0218001492000072
  • [4] BADAROGLU M, 1997, P ISCIS 97, P412
  • [5] BAKIRCIOGLU H, 1997, P JOINT C INFORMATIO, V2, P54
  • [6] BAKIRCIOGLU H, 1998, ELEKTRIK, V6
  • [7] Barto AG., 1989, LEARNING SEQUENTIAL
  • [8] BOURRELY J, 1989, CR ACAD SCI II, V309, P523
  • [9] BUSH RR, 1958, STOCHASTIC MODELS LE
  • [10] CARLSON NR, 1977, PHYSL BEHAV