Reinforcement learning with internal expectation in the random neural networks for cascaded decisions

被引:5
作者
Halici, U [1 ]
机构
[1] Middle E Tech Univ, Dept Elect & Elect Engn, Comp Vis & Artificial Neural Networks Res Lab, TR-06531 Ankara, Turkey
关键词
neural networks; random neural networks; reinforcement learning; extinction; cascaded decisions;
D O I
10.1016/S0303-2647(01)00144-7
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
The reinforcement learning scheme proposed in Halici (J. Biosystems 40 (1997) 83) for the random neural network (RNN) (Neural Computation 1 (1989) 502) is based on reward and performs well for stationary environments. However, when the environment is not stationary it suffers from getting stuck to the previously learned action and extinction is not possible. To overcome the problem, the reinforcement scheme is extended in Halici (Eur. J. Oper. Res., 126(2000) 288) by introducing a new weight update rule (E-rule) which takes into consideration the internal expectation of reinforcement. Although the E-rule is proposed for the RNN, it can be used for training learning automata or other intelligent systems based on reinforcement learning. This paper looks into the behavior of the learning scheme with internal expectation for the environments where the reinforcement is obtained after a sequence of cascaded decisions. The simulation results have shown that the RNN learns well and extinction is possible even for the cases with several decision steps and with hundreds of possible decision paths. (C) 2001 Elsevier Science Ireland Ltd. All rights reserved.
引用
收藏
页码:21 / 34
页数:14
相关论文
共 27 条
[21]  
NORMAN MF, 1968, MARKOV PROCESSES LEA
[22]   USE OF STOCHASTIC AUTOMATA FOR PARAMETER SELF-OPTIMIZATION WITH MULTIMODAL PERFORMANCE CRITERIA [J].
SHAPIRO, IJ ;
NARENDRA, KS .
IEEE TRANSACTIONS ON SYSTEMS SCIENCE AND CYBERNETICS, 1969, SSC5 (04) :352-+
[23]  
Sutton R. S., 1988, Machine Learning, V3, P9, DOI 10.1023/A:1022633531479
[24]  
Sutton R.S., 1984, THESIS U MASSACHUSSE
[25]  
VISWANATHAN R, 1972, Journal of Cybernetics, V2, P21, DOI 10.1080/01969727208548637
[26]  
Vorontsova I.P., 1965, PROBL PEREDACHI INF, V1, P122
[27]  
WATKINS CJCH, 1992, MACH LEARN, V8, P279, DOI 10.1007/BF00992698