An algorithm that excavates suboptimal states and improves Q-learning

被引:0
|
作者
Zhu, Canxin [1 ,2 ]
Yang, Jingmin [1 ,2 ]
Zhang, Wenjie [1 ,2 ]
Zheng, Yifeng [1 ,2 ]
机构
[1] Minnan Normal Univ, Sch Comp Sci, Zhangzhou 363000, Peoples R China
[2] Fuzhou Univ, Affiliated Prov Hosp, Fuzhou 363000, Fujian, Peoples R China
来源
ENGINEERING RESEARCH EXPRESS | 2024年 / 6卷 / 04期
关键词
reinforcement learning; exploration and exploitation; markov decision process; suboptimal state;
D O I
10.1088/2631-8695/ad8dae
中图分类号
T [工业技术];
学科分类号
08 ;
摘要
Reinforcement learning is inspired by the trial-and-error method in animal learning, where the reward values obtained from the interaction of the agent with the environment are used as feedback signals to train the agent. Reinforcement learning has attracted extensive attention in recent years. It is mainly used to solve sequential decision-making problems and has been applied to various aspects of life, such as autonomous driving, game gaming, and robotics. Exploration and exploitation are the main characteristics that distinguish reinforcement learning methods from other learning methods. Reinforcement learning methods need reward optimization algorithms to better balance exploration and exploitation. Aiming at the problems of unbalanced exploration and a large number of repeated explorations in the Q-learning algorithm in the MDP environment, an algorithm that excavates suboptimal states and improves Q-learning was proposed. It adopts the exploration idea of 'exploring the potential of the second-best', and explores the state with suboptimal state value, and calculates the exploration probability value according to the distance between the current state and the goal state. The larger the distance, the higher the exploration demand of the agent. In addition, only the immediate reward and the maximum action value of the next state are needed to calculate the Q value. Through the simulation experiments in two different MDP environments, The frozenLake8x8 environment and the CliffWalking environment, the results verify that the proposed algorithm obtains the highest average cumulative reward and the least total time consumption
引用
收藏
页数:18
相关论文
共 50 条
  • [31] State and Action Space Segmentation Algorithm in Q-learning
    Notsu, Akira
    Ichihashi, Hidetomo
    Honda, Katsuhiro
    2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8, 2008, : 2384 - 2389
  • [32] A Hybrid Fuzzy Q-Learning algorithm for robot navigation
    Gordon, Sean W.
    Reyes, Napoleon H.
    Barczak, Andre
    2011 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2011, : 2625 - 2631
  • [33] Power Control Algorithm Based on Q-Learning in Femtocell
    Li Yun
    Tang Ying
    Liu Hanxiao
    JOURNAL OF ELECTRONICS & INFORMATION TECHNOLOGY, 2019, 41 (11) : 2557 - 2564
  • [34] I2Q: A Fully Decentralized Q-Learning Algorithm
    Jiang, Jiechuan
    Lu, Zongqing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [35] Q-learning embedded sine cosine algorithm (QLESCA)
    Hamad, Qusay Shihab
    Samma, Hussein
    Suandi, Shahrel Azmin
    Mohamad-Saleh, Junita
    EXPERT SYSTEMS WITH APPLICATIONS, 2022, 193
  • [36] Ramp Metering Control Based on the Q-Learning Algorithm
    Ivanjko, Edouard
    Necoska, Daniela Koltovska
    Greguric, Martin
    Vujic, Miroslav
    Jurkovic, Goran
    Mandzuka, Sadko
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2015, 15 (05) : 88 - 97
  • [37] Sink Attraction Q-Learning Routing Algorithm For UWSNs
    Zhang, Zhi
    Li, Yibing
    Gao, Jialiang
    Ye, Fang
    2024 IEEE/CIC INTERNATIONAL CONFERENCE ON COMMUNICATIONS IN CHINA, ICCC, 2024,
  • [38] An improved Q-learning algorithm using synthetic pheromones
    Monekosso, N
    Remagnino, P
    Szarowicz, A
    FROM THEORY TO PRACTICE IN MULTI-AGENT SYSTEMS, 2002, 2296 : 197 - 206
  • [39] Integrated Q-Learning with Firefly Algorithm for Transportation Problems
    Pratiba K.R.
    Ridhanya S.
    Ridhisha J.
    Hemashree P.
    EAI Endorsed Transactions on Energy Web, 2024, 11 : 1 - 6
  • [40] Fundamental Q-learning Algorithm in Finding Optimal Policy
    Sun, Canyu
    2017 INTERNATIONAL CONFERENCE ON SMART GRID AND ELECTRICAL AUTOMATION (ICSGEA), 2017, : 243 - 246