Improving Q-learning by using the agent's action history

被引:0
|
作者
Saito M. [1 ]
Sekozawa T. [2 ]
机构
[1] Graduate School of Engineering, Kanagawa University, 3-27-1, Rokkakubashi, Kanagawa-ku, Yokohama, Kanagawa
[2] Dept. Information Systems Creation, Faculty of Engineering, Kanagawa University, 3-27-1, Rokkakubashi, Kanagawa-ku, Yokohama, Kanagawa
关键词
Action history; Action select; Machine learning; Q-learning; Reinforcement learning; Tabu search;
D O I
10.1541/ieejeiss.136.1209
中图分类号
学科分类号
摘要
Q-learning is learning the optimal policy by updating in action-state value function(Q-value) to maximize a expectation reward by a trial and error search.However, there is major issues slowness of learning speed.Therefore, we added technique agent memorize environmental information and useing with update of the Q-value in many states. By updating the Q-value in the number of conditions to give a lot of information to the agent, be able to reduce learning time. Further, by incorporating the stored environmental information into action selection method, and the action selection to avoid the failure behavior, such as learning to stagnation,improved the learning speed of learning the initial stage.In addition, we design a new action area value function, in order to search for much more statas from the learning initial.Finally, numerical examples which solved maze problem showed the usefulness of the proposed method. © 2016 The Institute of Electrical Engineers of Japan.
引用
收藏
页码:1209 / 1217
页数:8
相关论文
共 50 条
  • [21] Multi Q-Table Q-Learning
    Kantasewi, Nitchakun
    Marukatat, Sanparith
    Thainimit, Somying
    Manabu, Okumura
    2019 10TH INTERNATIONAL CONFERENCE OF INFORMATION AND COMMUNICATION TECHNOLOGY FOR EMBEDDED SYSTEMS (IC-ICTES), 2019,
  • [22] Multi Target Tracking using a Compact Q-Learning with a Teacher
    Saad, E. M.
    Awadalla, M. H.
    Hamdy, A. M.
    Ali, H. I.
    ICCES: 2008 INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS, 2007, : 173 - 178
  • [23] Cooperative Multi-Agent Q-Learning Using Distributed MPC
    Esfahani, Hossein Nejatbakhsh
    Velni, Javad Mohammadpour
    IEEE CONTROL SYSTEMS LETTERS, 2024, 8 : 2193 - 2198
  • [24] Enhancing Nash Q-learning and Team Q-learning mechanisms by using bottlenecks
    Ghazanfari, Behzad
    Mozayani, Nasser
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2014, 26 (06) : 2771 - 2783
  • [25] Automated Portfolio Rebalancing using Q-learning
    Darapaneni, Narayana
    Basu, Amitavo
    Savla, Sanket
    Gururajan, Raamanathan
    Saquib, Najmus
    Singhavi, Sudarshan
    Kale, Aishwarya
    Bid, Pratik
    Paduri, Anwesh Reddy
    2020 11TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2020, : 596 - 602
  • [26] Learning rates for Q-learning
    Even-Dar, E
    Mansour, Y
    JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 5 : 1 - 25
  • [27] Q-learning based on neural network in learning action selection of mobile robot
    Qiao, Junfei
    Hou, Zhanjun
    Ruan, Xiaogang
    2007 IEEE INTERNATIONAL CONFERENCE ON AUTOMATION AND LOGISTICS, VOLS 1-6, 2007, : 263 - 267
  • [28] Improving Energy Efficiency and QoS of LPWANs for IoT Using Q-Learning Based Data Routing
    Pandey, Om Jee
    Yuvaraj, Tankala
    Paul, Joseph K.
    Nguyen, Ha H.
    Gundepudi, Karthikay
    Shukla, Mahendra K.
    IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2022, 8 (01) : 365 - 379
  • [29] CVaR Q-Learning
    Stanko, Silvestr
    Macek, Karel
    COMPUTATIONAL INTELLIGENCE: 11th International Joint Conference, IJCCI 2019, Vienna, Austria, September 17-19, 2019, Revised Selected Papers, 2021, 922 : 333 - 358
  • [30] Impact of Neighboring Agent's Characteristics with Q-Learning in Network Multi-agent System
    Kaur, Harjot
    Devi, Ginni
    ADVANCED INFORMATICS FOR COMPUTING RESEARCH, ICAICR 2018, PT I, 2019, 955 : 744 - 756