Improving Q-learning by using the agent's action history

被引:0
|
作者
Saito M. [1 ]
Sekozawa T. [2 ]
机构
[1] Graduate School of Engineering, Kanagawa University, 3-27-1, Rokkakubashi, Kanagawa-ku, Yokohama, Kanagawa
[2] Dept. Information Systems Creation, Faculty of Engineering, Kanagawa University, 3-27-1, Rokkakubashi, Kanagawa-ku, Yokohama, Kanagawa
关键词
Action history; Action select; Machine learning; Q-learning; Reinforcement learning; Tabu search;
D O I
10.1541/ieejeiss.136.1209
中图分类号
学科分类号
摘要
Q-learning is learning the optimal policy by updating in action-state value function(Q-value) to maximize a expectation reward by a trial and error search.However, there is major issues slowness of learning speed.Therefore, we added technique agent memorize environmental information and useing with update of the Q-value in many states. By updating the Q-value in the number of conditions to give a lot of information to the agent, be able to reduce learning time. Further, by incorporating the stored environmental information into action selection method, and the action selection to avoid the failure behavior, such as learning to stagnation,improved the learning speed of learning the initial stage.In addition, we design a new action area value function, in order to search for much more statas from the learning initial.Finally, numerical examples which solved maze problem showed the usefulness of the proposed method. © 2016 The Institute of Electrical Engineers of Japan.
引用
收藏
页码:1209 / 1217
页数:8
相关论文
共 50 条
  • [1] Pricing in agent economies using multi-agent Q-learning
    Tesauro, G
    Kephart, JO
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2002, 5 (03) : 289 - 304
  • [2] Pricing in Agent Economies Using Multi-Agent Q-Learning
    Gerald Tesauro
    Jeffrey O. Kephart
    Autonomous Agents and Multi-Agent Systems, 2002, 5 : 289 - 304
  • [3] Optimal City Navigation for Pedestrians using Agent-specific Q-learning
    Miao, Lei
    2018 ANNUAL AMERICAN CONTROL CONFERENCE (ACC), 2018, : 4123 - 4128
  • [4] On Improving the Properties of Random Walk on Graph using Q-learning
    Matsuo, Ryotaro
    Miyashita, Tomoyuki
    Suzuki, Taisei
    Ohsaki, Hiroyuki
    IEICE COMMUNICATIONS EXPRESS, 2023, 12 (01): : 36 - 41
  • [5] Improving the Performance of Q-learning Using Simultanouse Q-values Updating
    Pouyan, Maryam
    Mousavi, Amin
    Golzari, Shahram
    Hatam, Ahmad
    2014 INTERNATIONAL CONGRESS ON TECHNOLOGY, COMMUNICATION AND KNOWLEDGE (ICTCK), 2014,
  • [6] The acquisition of sociality by using Q-learning in a multi-agent environment
    Nagayuki, Yasuo
    PROCEEDINGS OF THE SIXTEENTH INTERNATIONAL SYMPOSIUM ON ARTIFICIAL LIFE AND ROBOTICS (AROB 16TH '11), 2011, : 820 - 823
  • [7] Using the ITS Components in Improving the Q-Learning Policy for Instructional Sequencing
    Yessad, Amel
    AUGMENTED INTELLIGENCE AND INTELLIGENT TUTORING SYSTEMS, ITS 2023, 2023, 13891 : 247 - 256
  • [8] CONTINUOUS ACTION GENERATION OF Q-LEARNING IN MULTI-AGENT COOPERATION
    Hwang, Kao-Shing
    Chen, Yu-Jen
    Jiang, Wei-Cheng
    Lin, Tzung-Feng
    ASIAN JOURNAL OF CONTROL, 2013, 15 (04) : 1011 - 1020
  • [9] Attentional Factorized Q-Learning for Many-Agent Learning
    Wang, Xiaoqiang
    Ke, Liangjun
    Fu, Qiang
    IEEE ACCESS, 2022, 10 : 108775 - 108784
  • [10] CLQLMRS: improving cache locality in MapReduce job scheduling using Q-learning
    Rana Ghazali
    Sahar Adabi
    Ali Rezaee
    Douglas G. Down
    Ali Movaghar
    Journal of Cloud Computing, 11