Improving Q-learning by using the agent's action history

被引：0

作者：

Saito M. ^{[1
]}

Sekozawa T. ^{[2
]}

机构：

[1] Graduate School of Engineering, Kanagawa University, 3-27-1, Rokkakubashi, Kanagawa-ku, Yokohama, Kanagawa

[2] Dept. Information Systems Creation, Faculty of Engineering, Kanagawa University, 3-27-1, Rokkakubashi, Kanagawa-ku, Yokohama, Kanagawa

来源：

| 2016年 / Institute of Electrical Engineers of Japan卷 / 136期

关键词：

Action history; Action select; Machine learning; Q-learning; Reinforcement learning; Tabu search;

D O I：

10.1541/ieejeiss.136.1209

中图分类号：

学科分类号：

摘要：

Q-learning is learning the optimal policy by updating in action-state value function(Q-value) to maximize a expectation reward by a trial and error search.However, there is major issues slowness of learning speed.Therefore, we added technique agent memorize environmental information and useing with update of the Q-value in many states. By updating the Q-value in the number of conditions to give a lot of information to the agent, be able to reduce learning time. Further, by incorporating the stored environmental information into action selection method, and the action selection to avoid the failure behavior, such as learning to stagnation,improved the learning speed of learning the initial stage.In addition, we design a new action area value function, in order to search for much more statas from the learning initial.Finally, numerical examples which solved maze problem showed the usefulness of the proposed method. © 2016 The Institute of Electrical Engineers of Japan.

引用

页码：1209 / 1217

页数：8

共 50 条

[41] Double Q-learning Agent for Othello Board Game
Somasundaram, Thamarai Selvi
Panneerselvam, Karthikeyan
Bhuthapuri, Tarun
Mahadevan, Harini
Jose, Ashik
2018 10TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC), 2018, : 216 - 223
[42] Learning to Play Pac-Xon with Q-Learning and Two Double Q-Learning Variants
Schilperoort, Jits
Mak, Ivar
Drugan, Madalina M.
Wiering, Marco A.
2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 1151 - 1158
[43] Double action Q-learning for obstacle avoidance in a dynamically changing environment
Ngai, DCK
Yung, NHC
2005 IEEE Intelligent Vehicles Symposium Proceedings, 2005, : 211 - 216
[44] Oil Production Optimization Using Q-Learning Approach
Zahedi-Seresht, Mazyar
Sadeghi Bigham, Bahram
Khosravi, Shahrzad
Nikpour, Hoda
PROCESSES, 2024, 12 (01)
[45] Model based path planning using Q-Learning
Sharma, Avinash
Gupta, Kanika
Kumar, Anirudha
Sharma, Aishwarya
Kumar, Rajesh
2017 IEEE INTERNATIONAL CONFERENCE ON INDUSTRIAL TECHNOLOGY (ICIT), 2017, : 837 - 842
[46] BEAM MANAGEMENT SOLUTION USING Q-LEARNING FRAMEWORK
Araujo, Daniel C.
de Almeida, Andre L. F.
2019 IEEE 8TH INTERNATIONAL WORKSHOP ON COMPUTATIONAL ADVANCES IN MULTI-SENSOR ADAPTIVE PROCESSING (CAMSAP 2019), 2019, : 594 - 598
[47] Solving Twisty Puzzles Using Parallel Q-learning
Hukmani, Kavish
Kolekar, Sucheta
Vobugari, Sreekumar
ENGINEERING LETTERS, 2021, 29 (04) : 1535 - 1543
[48] Feature Extraction in Q-Learning using Neural Networks
Zhu, Henghui
Paschalidis, Ioannis Ch.
Hasselmo, Michael E.
2017 IEEE 56TH ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2017,
[49] Fuzzy Q-learning in continuous state and action space
XU Ming-liang1
TheJournalofChinaUniversitiesofPostsandTelecommunications, 2010, 17 (04) : 100 - 109
[50] Fuzzy Q-learning in continuous state and action space
Xu M.-L.
Xu W.-B.
Journal of China Universities of Posts and Telecommunications, 2010, 17 (04): : 100 - 109

← 1 2 3 4 5 →