Improving Q-learning by using the agent's action history

被引:0
作者
Saito M. [1 ]
Sekozawa T. [2 ]
机构
[1] Graduate School of Engineering, Kanagawa University, 3-27-1, Rokkakubashi, Kanagawa-ku, Yokohama, Kanagawa
[2] Dept. Information Systems Creation, Faculty of Engineering, Kanagawa University, 3-27-1, Rokkakubashi, Kanagawa-ku, Yokohama, Kanagawa
关键词
Action history; Action select; Machine learning; Q-learning; Reinforcement learning; Tabu search;
D O I
10.1541/ieejeiss.136.1209
中图分类号
学科分类号
摘要
Q-learning is learning the optimal policy by updating in action-state value function(Q-value) to maximize a expectation reward by a trial and error search.However, there is major issues slowness of learning speed.Therefore, we added technique agent memorize environmental information and useing with update of the Q-value in many states. By updating the Q-value in the number of conditions to give a lot of information to the agent, be able to reduce learning time. Further, by incorporating the stored environmental information into action selection method, and the action selection to avoid the failure behavior, such as learning to stagnation,improved the learning speed of learning the initial stage.In addition, we design a new action area value function, in order to search for much more statas from the learning initial.Finally, numerical examples which solved maze problem showed the usefulness of the proposed method. © 2016 The Institute of Electrical Engineers of Japan.
引用
收藏
页码:1209 / 1217
页数:8
相关论文
共 50 条
  • [31] Multi-Target Tracking using a Compact Q-Learning with a Teacher
    Saad, E. M.
    Awadalla, M. H.
    Hamdy, A. M.
    Ali, H. I.
    NRSC: 2009 NATIONAL RADIO SCIENCE CONFERENCE: NRSC 2009, VOLS 1 AND 2, 2009, : 284 - 295
  • [32] A distributed Q-learning algorithm for multi-agent team coordination
    Huang, J
    Yang, B
    Liu, DY
    Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 108 - 113
  • [33] Periodic Q-Learning
    Lee, Donghwan
    He, Niao
    LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 582 - 598
  • [34] A STUDY ON MODELING AND ANALYSIS OF AGENT-BASED SIMULATIONS WITH Q-LEARNING
    Nakano, Nobuhide
    INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2011, 7 (01): : 51 - 60
  • [35] A Comparative Study of Opponent Type Effects on Speed of Learning for an Adversarial Q-Learning Agent
    Zamstein, Lavi M.
    Smith, Brandt A.
    Hodhod, Rania
    2019 IEEE SOUTHEASTCON, 2019,
  • [36] Action Candidate Driven Clipped Double Q-Learning for Discrete and Continuous Action Tasks
    Jiang, Haobo
    Li, Guangyu
    Xie, Jin
    Yang, Jian
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 5269 - 5279
  • [37] An Online Home Energy Management System using Q-Learning and Deep Q-Learning
    Izmitligil, Hasan
    Karamancioglu, Abdurrahman
    SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2024, 43
  • [38] Agent-based optimization for multiple signalized intersections using q-learning
    Teo, Kenneth Tze Kin
    Yeo, Kiam Beng
    Chin, Yit Kwong
    Chuo, Helen Sin Ee
    Tan, Min Keng
    International Journal of Simulation: Systems, Science and Technology, 2014, 15 (06): : 90 - 96
  • [39] Continuous Q-Learning for Multi-Agent Cooperation
    Hwang, Kao-Shing
    Jiang, Wei-Cheng
    Lin, Yu-Hong
    Lai, Li-Hsin
    CYBERNETICS AND SYSTEMS, 2012, 43 (03) : 227 - 256
  • [40] Learning to Play Pac-Xon with Q-Learning and Two Double Q-Learning Variants
    Schilperoort, Jits
    Mak, Ivar
    Drugan, Madalina M.
    Wiering, Marco A.
    2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 1151 - 1158