An improved Q-learning algorithm based on exploration region expansion strategy

被引：0

作者：

Gao, Qingji ^{[1
]}

Hong, Bingong ^{[1
]}

He, Zhendong ^{[2
]}

Liu, Jie ^{[2
]}

Niu, Guochen ^{[2
]}

机构：

[1] Harbin Inst Technol, Dept Comp Sci & Technol, Harbin 150001, Peoples R China

[2] Civil Aviat Univ China, Inst Res Robot, Tianjin 300300, Peoples R China

来源：

WCICA 2006: SIXTH WORLD CONGRESS ON INTELLIGENT CONTROL AND AUTOMATION, VOLS 1-12, CONFERENCE PROCEEDINGS | 2006年

关键词：

Q-learning; exploration region expansion; exploration-exploitation; Metropolis criterion;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In order to find a good solution to one of the key problems in Q-learning algorithm-keeping the balance between exploration and exploitation, an improved Q-learning algorithm based on exploration region expansion strategy is proposed on the base of Metropolis criterion-based Q-learning. With this strategy, the exploration blindness in the entire environment is eliminated, and the learning efficiency is increased. Meanwhile, other feasible path is sought where agent encounters obstacles, which makes the implementation of the algorithm on real robot easy. An automatic termination condition is also put forward, therefore, the redundant learning after finding optimal path is avoided, and the time of learning is reduced. The validity of the algorithm is proved by simulation experiments.

引用

页码：4167 / +

页数：2

共 50 条

[21] Q-learning with heterogeneous update strategy
Tan, Tao
Xie, Hong
Feng, Liang
[J]. INFORMATION SCIENCES, 2024, 656
[22] Improved Q-Learning Algorithm Based on Approximate State Matching in Agricultural Plant Protection Environment
Sun, Fengjie
Wang, Xianchang
Zhang, Rui
[J]. ENTROPY, 2021, 23 (06)
[23] Can Q-learning be Improved with Advice?
Golowich, Noah
Moitra, Ankur
[J]. CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178
[24] Q-Learning Based Forwarding Strategy in Named Data Networks
Hnaien, Hend
Touati, Haifa
[J]. COMPUTATIONAL SCIENCE AND ITS APPLICATIONS - ICCSA 2020, PT I, 2020, 12249 : 434 - 444
[25] Fault diagnosis of track circuit based on improved sparrow search algorithm and Q-Learning optimization for ensemble learning
Xu K.
Zheng H.
Tu Y.
Wu S.
[J]. Journal of Railway Science and Engineering, 2023, 20 (11) : 4426 - 4437
[26] ENHANCEMENTS OF FUZZY Q-LEARNING ALGORITHM
Glowaty, Grzegorz
[J]. COMPUTER SCIENCE-AGH, 2005, 7 : 77 - 87
[27] Heuristically accelerated Q-learning algorithm based on Laplacian Eigenmap
Zhu, Mei-Qiang
Li, Ming
Cheng, Yu-Hu
Zhang, Qian
Wang, Xue-Song
[J]. Kongzhi yu Juece/Control and Decision, 2014, 29 (03): : 425 - 430
[28] Design of cognitive radar jamming based on Q-learning algorithm
Li, Yun-Jie
Zhu, Yun-Peng
Gao, Mei-Guo
[J]. Beijing Ligong Daxue Xuebao/Transaction of Beijing Institute of Technology, 2015, 35 (11): : 1194 - 1199
[29] Q-learning based Air Combat Target Assignment Algorithm
Luo, Peng-Cheng
Xie, Jun-jie
Che, Wan-Fang
[J]. 2016 IEEE INTERNATIONAL CONFERENCE ON SYSTEMS, MAN, AND CYBERNETICS (SMC), 2016, : 779 - 783
[30] Dynamic feature selection algorithm based on Q-learning mechanism
Ruohao Xu
Mengmeng Li
Zhongliang Yang
Lifang Yang
Kangjia Qiao
Zhigang Shang
[J]. Applied Intelligence, 2021, 51 : 7233 - 7244

← 1 2 3 4 5 →