An algorithm that excavates suboptimal states and improves Q-learning

被引:0
|
作者
Zhu, Canxin [1 ,2 ]
Yang, Jingmin [1 ,2 ]
Zhang, Wenjie [1 ,2 ]
Zheng, Yifeng [1 ,2 ]
机构
[1] Minnan Normal Univ, Sch Comp Sci, Zhangzhou 363000, Peoples R China
[2] Fuzhou Univ, Affiliated Prov Hosp, Fuzhou 363000, Fujian, Peoples R China
来源
ENGINEERING RESEARCH EXPRESS | 2024年 / 6卷 / 04期
关键词
reinforcement learning; exploration and exploitation; markov decision process; suboptimal state;
D O I
10.1088/2631-8695/ad8dae
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Reinforcement learning is inspired by the trial-and-error method in animal learning, where the reward values obtained from the interaction of the agent with the environment are used as feedback signals to train the agent. Reinforcement learning has attracted extensive attention in recent years. It is mainly used to solve sequential decision-making problems and has been applied to various aspects of life, such as autonomous driving, game gaming, and robotics. Exploration and exploitation are the main characteristics that distinguish reinforcement learning methods from other learning methods. Reinforcement learning methods need reward optimization algorithms to better balance exploration and exploitation. Aiming at the problems of unbalanced exploration and a large number of repeated explorations in the Q-learning algorithm in the MDP environment, an algorithm that excavates suboptimal states and improves Q-learning was proposed. It adopts the exploration idea of 'exploring the potential of the second-best', and explores the state with suboptimal state value, and calculates the exploration probability value according to the distance between the current state and the goal state. The larger the distance, the higher the exploration demand of the agent. In addition, only the immediate reward and the maximum action value of the next state are needed to calculate the Q value. Through the simulation experiments in two different MDP environments, The frozenLake8x8 environment and the CliffWalking environment, the results verify that the proposed algorithm obtains the highest average cumulative reward and the least total time consumption
引用
收藏
页数:18
相关论文
共 50 条
  • [21] Hybrid Q-learning Algorithm About Cooperation in MAS
    Chen, Wei
    Guo, Jing
    Li, Xiong
    Wang, Jie
    CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 3943 - 3947
  • [22] Induced states in a decision tree constructed by Q-learning
    Hwang, Kao-Shing
    Chen, Yu-Jen
    Jiang, Wei-Cheng
    Yang, Tsung-Wen
    INFORMATION SCIENCES, 2012, 213 : 39 - 49
  • [23] Q-Learning Algorithm Based on Incremental RBF Network
    Hu Y.
    Li D.
    He Y.
    Han J.
    Jiqiren/Robot, 2019, 41 (05): : 562 - 573
  • [24] Coherent beam combination based on Q-learning algorithm
    Zhang, Xi
    Li, Pingxue
    Zhu, Yunchen
    Li, Chunyong
    Yao, Chuanfei
    Wang, Luo
    Dong, Xueyan
    Li, Shun
    OPTICS COMMUNICATIONS, 2021, 490
  • [25] Controlling Sequential Hybrid Evolutionary Algorithm by Q-Learning
    Zhang, Haotian
    Sun, Jianyong
    Back, Thomas
    Zhang, Qingfu
    Xu, Zongben
    IEEE COMPUTATIONAL INTELLIGENCE MAGAZINE, 2023, 18 (01) : 84 - 103
  • [26] Adaptive sensor-planning algorithm with Q-learning
    Maeda, M
    Kato, N
    Kashimura, H
    2004 IEEE CONFERENCE ON CYBERNETICS AND INTELLIGENT SYSTEMS, VOLS 1 AND 2, 2004, : 966 - 969
  • [27] A new Q-learning algorithm based on the Metropolis criterion
    Guo, MZ
    Liu, Y
    Malec, J
    IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2004, 34 (05): : 2140 - 2143
  • [28] Adaptive PID controller based on Q-learning algorithm
    Shi, Qian
    Lam, Hak-Keung
    Xiao, Bo
    Tsai, Shun-Hung
    CAAI TRANSACTIONS ON INTELLIGENCE TECHNOLOGY, 2018, 3 (04) : 235 - 244
  • [29] Hexagon-based Q-learning algorithm and applications
    Yang, Hyun-Chang
    Kim, Ho-Duck
    Yoon, Han-Ul
    Jang, In-Hun
    Sim, Kwee-Bo
    INTERNATIONAL JOURNAL OF CONTROL AUTOMATION AND SYSTEMS, 2007, 5 (05) : 570 - 576
  • [30] Anomaly Detection using Fuzzy Q-learning Algorithm
    Shamshirband, Shahaboddin
    Anuar, Nor Badrul
    Kiah, Miss Laiha Mat
    Misra, Sanjay
    ACTA POLYTECHNICA HUNGARICA, 2014, 11 (08) : 5 - 28