DISCRETIZED ESTIMATOR LEARNING AUTOMATA

被引:57
作者
LANCTOT, JK [1 ]
OOMMEN, BJ [1 ]
机构
[1] MITEL CORP,KANATA K2K 1X3,ON,CANADA
来源
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS | 1992年 / 22卷 / 06期
基金
加拿大自然科学与工程研究理事会;
关键词
721.1 Computer Theory; Includes Computational Logic; Automata Theory; Switching Theory; Programming Theory - 723.1 Computer Programming - 723.4 Artificial Intelligence - 921.6 Numerical Methods;
D O I
10.1109/21.199471
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Learning automata are stochastic automata interacting with an unknown random environment. The fundamental problem is that of learning, through feedback, the action that has the highest probability of being rewarded by the environment. A class of algorithms known as estimator algorithms are presently among the fastest known. They are characterized by the use of running estimates of the probabilities of each possible action being rewarded. The improvements gained by rendering the various estimator algorithms discrete are investigated. This is done by restricting the probability of selecting an action to a finite, and hence, discrete subset of [0,1]. This modification is proven to be epsilon-optimal in all stationary environments. In the paper, various discretized estimator algorithms (DEA's) are constructed. Subsequently, members of the family of DEA's will be shown to be epsilon-optimal by deriving two sufficient conditions required for the epsilon-optimality-the properties of monotonicity and moderation. There is a conjecture presented about the necessity of these conditions for epsilon-optimality too. Experimental results indicate that the discrete modifications improve the performance of these algorithms to the extent that these automata constitute the fastest converging and most accurate learning automata reported to date.
引用
收藏
页码:1473 / 1483
页数:11
相关论文
共 33 条
  • [1] BABA S, 1980, INT J SYST SCI, V11, P1447
  • [2] KARLIN S, 1974, 1ST COURSE STOCHASTI
  • [3] LAKSHMIVARAHAN S, 1973, IEEE T SYST MAN CYB, VSMC3, P281
  • [4] LAKSHMIVARAHAN S, 1981, APPL MATH COMPUT, V8, P51, DOI 10.1016/0096-3003(81)90035-7
  • [5] Lakshmivarahan S, 1981, LEARNING ALGORITHMS
  • [6] LANCTOT JK, 1989, THESIS CARLETON U OT
  • [7] MEYBODI MR, 1983, 3RD P YAL WORKSH APP
  • [8] MEYBODI MR, THESIS U OKLA NORMAN
  • [9] MUKHOPADHYAG S, 1989, IEEE T SYST MAN SEP, P1008
  • [10] Narendra K. S., 1977, Journal of Cybernetics and Information Science, V1, P53