The Hierarchical Discrete Learning Automaton Suitable for Environments with Many Actions and High Accuracy Requirements

被引：1

作者：

Omslandseter, Rebekka Olsson ^{[1
]}

Jiao, Lei ^{[1
]}

Zhang, Xuan ^{[2
]}

Yazidi, Anis ^{[3
]}

Oommen, B. John ^{[1
,4
]}

机构：

[1] Univ Agder, Dept Informat & Commun Technol, N-4879 Grimstad, Norway

[2] Norwegian Res Ctr NORCE, N-4879 Grimstad, Norway

[3] Oslo Metropolitan Univ, N-0167 Oslo, Norway

[4] Carleton Univ, Ottawa, ON, Canada

来源：

AI 2021: ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年 / 13151卷

关键词：

Reinforcement learning; Learning Automata; Hierarchical discrete pursuit LA; ALGORITHMS;

D O I：

10.1007/978-3-030-97546-3_41

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Since its early beginning, the paradigm of Learning Automata (LA), has attracted much interest. Over the last decades, new concepts and various improvements have been introduced to increase the LA's speed and accuracy, including employing probability updating functions, discretizing the probability space, and implementing the "Pursuit" concept. The concept of incorporating "structure" into the ordering of the LA's actions is one of the latest advancements to the field, leading to the epsilon-optimal Hierarchical Continuous Pursuit LA (HCPA) that has superior performance to other LA variants when the number of actions is large. Although the previously proposed HCPA is powerful, its speed has a handicap when the required action probability of an action is approaching unity. The reason for this slow convergence is that the learning parameter operates in a multiplicative manner within the probability space, making the increment of the action probability smaller as its probability becomes close to unity. Therefore, we propose the novel Hierarchical Discrete Learning Automata (HDPA) in this paper, which does not possess the same impediment as the HCPA. The proposed machine infuse the principle of discretization into the action probability vector's updating functionality, where this type of updating is invoked recursively at every depth within a hierarchical tree structure and we pursue the best estimated action in all iterations through utilization of the Estimator phenomenon. The proposed machine is epsilon-optimal, and our experimental results demonstrate that the number of iterations required before convergence is significantly reduced for the HDPA, when compared with the HCPA.

引用

页码：507 / 518

页数：12

共 14 条

[1] Lakshmivarahan, 1981, LEARNING ALGORITHMS, DOI [10.1007/978-1-4612-5975-6, DOI 10.1007/978-1-4612-5975-6]
[2] LAKSHMIVARAHAN S, 1973, IEEE T SYST MAN CYB, VSMC3, P281
[3] DISCRETIZED ESTIMATOR LEARNING AUTOMATA
LANCTOT, JK
OOMMEN, BJ
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1992, 22 (06): : 1473 - 1483
[4] Narendra K. S, 2012, LEARNING AUTOMATA IN
[5] OOMMEN BJ, 1986, IEEE T SYST MAN CYB, V16, P282
[6] EPSILON-OPTIMAL DISCRETIZED LINEAR REWARD-PENALTY LEARNING AUTOMATA
OOMMEN, BJ
CHRISTENSEN, JPR
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1988, 18 (03): : 451 - 458
[7] Continuous and discretized pursuit learning schemes: Various algorithms and their comparison
Oommen, BJ
Agache, M
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2001, 31 (03): : 277 - 287
[8] Poznyak A.S., 1997, LEARNING AUTOMATA ST, V3, DOI [10.1007/BFb0015102, DOI 10.1007/BFB0015102]
[9] Thathachar M. A. L., 1986, P PLAT JUB C SYST SI
[10] Tsetlin M. L., 1963, USP MAT NAUK, V8, P1

← 1 2 →