Improving Q-learning by using the agent's action history

被引：0

作者：

Saito M. ^{[1
]}

Sekozawa T. ^{[2
]}

机构：

[1] Graduate School of Engineering, Kanagawa University, 3-27-1, Rokkakubashi, Kanagawa-ku, Yokohama, Kanagawa

[2] Dept. Information Systems Creation, Faculty of Engineering, Kanagawa University, 3-27-1, Rokkakubashi, Kanagawa-ku, Yokohama, Kanagawa

来源：

| 2016年 / Institute of Electrical Engineers of Japan卷 / 136期

关键词：

Action history; Action select; Machine learning; Q-learning; Reinforcement learning; Tabu search;

D O I：

10.1541/ieejeiss.136.1209

中图分类号：

学科分类号：

摘要：

Q-learning is learning the optimal policy by updating in action-state value function(Q-value) to maximize a expectation reward by a trial and error search.However, there is major issues slowness of learning speed.Therefore, we added technique agent memorize environmental information and useing with update of the Q-value in many states. By updating the Q-value in the number of conditions to give a lot of information to the agent, be able to reduce learning time. Further, by incorporating the stored environmental information into action selection method, and the action selection to avoid the failure behavior, such as learning to stagnation,improved the learning speed of learning the initial stage.In addition, we design a new action area value function, in order to search for much more statas from the learning initial.Finally, numerical examples which solved maze problem showed the usefulness of the proposed method. © 2016 The Institute of Electrical Engineers of Japan.

引用

页码：1209 / 1217

页数：8

共 50 条

[31] Multi-Target Tracking using a Compact Q-Learning with a Teacher
Saad, E. M.
Awadalla, M. H.
Hamdy, A. M.
Ali, H. I.
NRSC: 2009 NATIONAL RADIO SCIENCE CONFERENCE: NRSC 2009, VOLS 1 AND 2, 2009, : 284 - 295
[32] A distributed Q-learning algorithm for multi-agent team coordination
Huang, J
Yang, B
Liu, DY
Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Vols 1-9, 2005, : 108 - 113
[33] Periodic Q-Learning
Lee, Donghwan
He, Niao
LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 582 - 598
[34] A STUDY ON MODELING AND ANALYSIS OF AGENT-BASED SIMULATIONS WITH Q-LEARNING
Nakano, Nobuhide
INTERNATIONAL JOURNAL OF INNOVATIVE COMPUTING INFORMATION AND CONTROL, 2011, 7 (01): : 51 - 60
[35] A Comparative Study of Opponent Type Effects on Speed of Learning for an Adversarial Q-Learning Agent
Zamstein, Lavi M.
Smith, Brandt A.
Hodhod, Rania
2019 IEEE SOUTHEASTCON, 2019,
[36] Action Candidate Driven Clipped Double Q-Learning for Discrete and Continuous Action Tasks
Jiang, Haobo
Li, Guangyu
Xie, Jin
Yang, Jian
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (04) : 5269 - 5279
[37] An Online Home Energy Management System using Q-Learning and Deep Q-Learning
Izmitligil, Hasan
Karamancioglu, Abdurrahman
SUSTAINABLE COMPUTING-INFORMATICS & SYSTEMS, 2024, 43
[38] Agent-based optimization for multiple signalized intersections using q-learning
Teo, Kenneth Tze Kin
Yeo, Kiam Beng
Chin, Yit Kwong
Chuo, Helen Sin Ee
Tan, Min Keng
International Journal of Simulation: Systems, Science and Technology, 2014, 15 (06): : 90 - 96
[39] Continuous Q-Learning for Multi-Agent Cooperation
Hwang, Kao-Shing
Jiang, Wei-Cheng
Lin, Yu-Hong
Lai, Li-Hsin
CYBERNETICS AND SYSTEMS, 2012, 43 (03) : 227 - 256
[40] Learning to Play Pac-Xon with Q-Learning and Two Double Q-Learning Variants
Schilperoort, Jits
Mak, Ivar
Drugan, Madalina M.
Wiering, Marco A.
2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 1151 - 1158

← 1 2 3 4 5 →