Improving Q-learning by using the agent's action history

被引：0

作者：

Saito M. ^{[1
]}

Sekozawa T. ^{[2
]}

机构：

[1] Graduate School of Engineering, Kanagawa University, 3-27-1, Rokkakubashi, Kanagawa-ku, Yokohama, Kanagawa

[2] Dept. Information Systems Creation, Faculty of Engineering, Kanagawa University, 3-27-1, Rokkakubashi, Kanagawa-ku, Yokohama, Kanagawa

来源：

| 2016年 / Institute of Electrical Engineers of Japan卷 / 136期

关键词：

Action history; Action select; Machine learning; Q-learning; Reinforcement learning; Tabu search;

D O I：

10.1541/ieejeiss.136.1209

中图分类号：

学科分类号：

摘要：

Q-learning is learning the optimal policy by updating in action-state value function(Q-value) to maximize a expectation reward by a trial and error search.However, there is major issues slowness of learning speed.Therefore, we added technique agent memorize environmental information and useing with update of the Q-value in many states. By updating the Q-value in the number of conditions to give a lot of information to the agent, be able to reduce learning time. Further, by incorporating the stored environmental information into action selection method, and the action selection to avoid the failure behavior, such as learning to stagnation,improved the learning speed of learning the initial stage.In addition, we design a new action area value function, in order to search for much more statas from the learning initial.Finally, numerical examples which solved maze problem showed the usefulness of the proposed method. © 2016 The Institute of Electrical Engineers of Japan.

引用

页码：1209 / 1217

页数：8

共 50 条

[1] Pricing in agent economies using multi-agent Q-learning
Tesauro, G
Kephart, JO
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2002, 5 (03) : 289 - 304
[2] Pricing in Agent Economies Using Multi-Agent Q-Learning
Gerald Tesauro
Jeffrey O. Kephart
Autonomous Agents and Multi-Agent Systems, 2002, 5 : 289 - 304
[3] Optimal City Navigation for Pedestrians using Agent-specific Q-learning
Miao, Lei
2018 ANNUAL AMERICAN CONTROL CONFERENCE (ACC), 2018, : 4123 - 4128
[4] On Improving the Properties of Random Walk on Graph using Q-learning
Matsuo, Ryotaro
Miyashita, Tomoyuki
Suzuki, Taisei
Ohsaki, Hiroyuki
IEICE COMMUNICATIONS EXPRESS, 2023, 12 (01): : 36 - 41
[5] Improving the Performance of Q-learning Using Simultanouse Q-values Updating
Pouyan, Maryam
Mousavi, Amin
Golzari, Shahram
Hatam, Ahmad
2014 INTERNATIONAL CONGRESS ON TECHNOLOGY, COMMUNICATION AND KNOWLEDGE (ICTCK), 2014,
[6] The acquisition of sociality by using Q-learning in a multi-agent environment
Nagayuki, Yasuo
PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 16TH '11), 2011, : 820 - 823
[7] Using the ITS Components in Improving the Q-Learning Policy for Instructional Sequencing
Yessad, Amel
AUGMENTED INTELLIGENCE AND INTELLIGENT TUTORING SYSTEMS, ITS 2023, 2023, 13891 : 247 - 256
[8] CONTINUOUS ACTION GENERATION OF Q-LEARNING IN MULTI-AGENT COOPERATION
Hwang, Kao-Shing
Chen, Yu-Jen
Jiang, Wei-Cheng
Lin, Tzung-Feng
ASIAN JOURNAL OF CONTROL, 2013, 15 (04) : 1011 - 1020
[9] Attentional Factorized Q-Learning for Many-Agent Learning
Wang, Xiaoqiang
Ke, Liangjun
Fu, Qiang
IEEE ACCESS, 2022, 10 : 108775 - 108784
[10] CLQLMRS: improving cache locality in MapReduce job scheduling using Q-learning
Rana Ghazali
Sahar Adabi
Ali Rezaee
Douglas G. Down
Ali Movaghar
Journal of Cloud Computing, 11

← 1 2 3 4 5 →