Improving Q-learning by using the agent's action history

被引：0

作者：

Saito M. ^{[1
]}

Sekozawa T. ^{[2
]}

机构：

[1] Graduate School of Engineering, Kanagawa University, 3-27-1, Rokkakubashi, Kanagawa-ku, Yokohama, Kanagawa

[2] Dept. Information Systems Creation, Faculty of Engineering, Kanagawa University, 3-27-1, Rokkakubashi, Kanagawa-ku, Yokohama, Kanagawa

来源：

| 2016年 / Institute of Electrical Engineers of Japan卷 / 136期

关键词：

Action history; Action select; Machine learning; Q-learning; Reinforcement learning; Tabu search;

D O I：

10.1541/ieejeiss.136.1209

中图分类号：

学科分类号：

摘要：

Q-learning is learning the optimal policy by updating in action-state value function(Q-value) to maximize a expectation reward by a trial and error search.However, there is major issues slowness of learning speed.Therefore, we added technique agent memorize environmental information and useing with update of the Q-value in many states. By updating the Q-value in the number of conditions to give a lot of information to the agent, be able to reduce learning time. Further, by incorporating the stored environmental information into action selection method, and the action selection to avoid the failure behavior, such as learning to stagnation,improved the learning speed of learning the initial stage.In addition, we design a new action area value function, in order to search for much more statas from the learning initial.Finally, numerical examples which solved maze problem showed the usefulness of the proposed method. © 2016 The Institute of Electrical Engineers of Japan.

引用

页码：1209 / 1217

页数：8

共 50 条

[21] Multi Q-Table Q-Learning
Kantasewi, Nitchakun
Marukatat, Sanparith
Thainimit, Somying
Manabu, Okumura
2019 10TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY FOR EMBEDDED SYSTEMS (IC-ICTES), 2019,
[22] Multi Target Tracking using a Compact Q-Learning with a Teacher
Saad, E. M.
Awadalla, M. H.
Hamdy, A. M.
Ali, H. I.
ICCES: 2008 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS, 2007, : 173 - 178
[23] Cooperative Multi-Agent Q-Learning Using Distributed MPC
Esfahani, Hossein Nejatbakhsh
Velni, Javad Mohammadpour
IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 2193 - 2198
[24] Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks
Ghazanfari, Behzad
Mozayani, Nasser
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 26 (06) : 2771 - 2783
[25] Automated Portfolio Rebalancing using Q-learning
Darapaneni, Narayana
Basu, Amitavo
Savla, Sanket
Gururajan, Raamanathan
Saquib, Najmus
Singhavi, Sudarshan
Kale, Aishwarya
Bid, Pratik
Paduri, Anwesh Reddy
2020 11TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2020, : 596 - 602
[26] Learning rates for Q-learning
Even-Dar, E
Mansour, Y
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25
[27] Q-learning based on neural network in learning action selection of mobile robot
Qiao, Junfei
Hou, Zhanjun
Ruan, Xiaogang
2007 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS, VOLS 1-6, 2007, : 263 - 267
[28] Improving Energy Efficiency and QoS of LPWANs for IoT Using Q-Learning Based Data Routing
Pandey, Om Jee
Yuvaraj, Tankala
Paul, Joseph K.
Nguyen, Ha H.
Gundepudi, Karthikay
Shukla, Mahendra K.
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2022, 8 (01) : 365 - 379
[29] CVaR Q-Learning
Stanko, Silvestr
Macek, Karel
COMPUTATIONAL INTELLIGENCE: 11th International Joint Conference, IJCCI 2019, Vienna, Austria, September 17-19, 2019, Revised Selected Papers, 2021, 922 : 333 - 358
[30] Impact of Neighboring Agent's Characteristics with Q-Learning in Network Multi-agent System
Kaur, Harjot
Devi, Ginni
ADVANCED INFORMATICS FOR COMPUTING RESEARCH, ICAICR 2018, PT I, 2019, 955 : 744 - 756

← 1 2 3 4 5 →