EPSILON-OPTIMAL DISCRETIZED LINEAR REWARD-PENALTY LEARNING AUTOMATA

被引:43
作者
OOMMEN, BJ [1 ]
CHRISTENSEN, JPR [1 ]
机构
[1] UNIV COPENHAGEN,INST MATH,DK-2100 COPENHAGEN,DENMARK
来源
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS | 1988年 / 18卷 / 03期
关键词
PROBABILITY - SYSTEMS SCIENCE AND CYBERNETICS -- Learning Systems;
D O I
10.1109/21.7494
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Variable-structure stochastic automata (VSSA) are considered which interact with an environment and which dynamically learn the optimal action that the environment offers. Like all VSSA the automata are fully defined by a set of action-probability updating rules. However, to minimize the requirements on the random-number generator used to implement the VSSA, and to increase the speed of convergence of the automaton, the case in which the probability-updating functions can assume only a finite number of values. These values discretize the probability space [0, 1] and hence they are called discretized learning automata. The discretized automata are linear because the subintervals of [0, 1] are of equal length. The authors prove the following results: (a) two-action discretized linear reward-penalty automata are ergodic and Ε-optimal in all environments whose minimum penalty probability is less than 0.5; (b) there exist discretized two-action linear reward-penalty automata that are ergodic and Ε-optimal in all random environments, and (c) discretized two-action linear reward-penalty automata with artifically created absorbing barriers are Ε-optimal in all random environments.
引用
收藏
页码:451 / 458
页数:8
相关论文
共 22 条
  • [1] Flerov Y. A., 1972, Journal of Cybernetics, V2, P112, DOI 10.1080/01969727208542916
  • [2] Isaacson DL, 1976, MARKOV CHAINS THEORY
  • [3] LAKSHMIVARAHAN S, 1973, IEEE T SYST MAN CYB, VSMC3, P281
  • [4] LAKSHMIVARAHAN S, 1981, APPL MATH COMPUT, V8, P51, DOI 10.1016/0096-3003(81)90035-7
  • [5] Lakshmivarahan S, 1981, LEARNING ALGORITHMS
  • [6] LAKSHMIVARAHAN S, 1979, EECS7901 U OKL SCH E
  • [7] MEYBODI MR, THESIS U OKLAHOMA NO
  • [8] LEARNING AUTOMATA - SURVEY
    NARENDRA, KS
    THATHACHAR, MA
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1974, SMC4 (04): : 323 - 334
  • [9] APPLICATION OF LEARNING AUTOMATA TO TELEPHONE TRAFFIC ROUTING AND CONTROL
    NARENDRA, KS
    WRIGHT, EA
    MASON, LG
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1977, 7 (11): : 785 - 792
  • [10] ON THE BEHAVIOR OF A LEARNING AUTOMATON IN A CHANGING ENVIRONMENT WITH APPLICATION TO TELEPHONE TRAFFIC ROUTING
    NARENDRA, KS
    THATHACHAR, MAL
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1980, 10 (05): : 262 - 269