Policy Sharing Using Aggregation Trees for Q-Learning in a Continuous State and Action Spaces

被引:4
作者
Chen, Yu-Jen [1 ]
Jiang, Wei-Cheng [2 ]
Ju, Ming-Yi [3 ]
Hwang, Kao-Shing [4 ,5 ]
机构
[1] Natl Chung Cheng Univ, Dept Elect Engn, Chiayi 62102, Taiwan
[2] Tunghai Univ, Dept Elect Engn, Taichung 40704, Taiwan
[3] Natl Univ Tainan, Dept Comp Sci & Informat Engn, Tainan 70005, Taiwan
[4] Natl Sun Yat Sen Univ, Dept Elect Engn, Kaohsiung 80424, Taiwan
[5] Kaohsiung Med Univ, Dept Healthcare Adm & Med Informat, Kaohsiung 80708, Taiwan
关键词
Multiagent system; policy sharing; Q-learning; tree structure;
D O I
10.1109/TCDS.2019.2926477
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Q-learning is a generic approach that uses a finite discrete state and an action domain to estimate action values using tabular or function approximation methods. An intelligent agent eventually learns policies from continuous sensory inputs and encodes these environmental inputs onto a discrete state space. The application of Q-learning in a continuous state/action domain is the subject of many studies. This paper uses a tree structure to approximate a Q-function using in a continuous state domain. The agent selects a discretized action with a maximum Q-value and this discretized action is then extended to a continuous action using an action bias function. Reinforcement learning is difficult for a single agent when the state space is huge. This proposed architecture is also applied to a multiagent system, wherein an individual agent transfers its useful Q-values to other agents to accelerate the learning process. Policy is shared between agents by grafting the branches of trees in which Q-values are stored to other trees. The results for simulation show that the proposed architecture performs better than tabular Q-learning and significantly accelerates the learning process because all agents use the sharing mechanisms to cooperate with each other.
引用
收藏
页码:474 / 485
页数:12
相关论文
共 32 条
  • [1] Abed-alguni BH., 2017, Jordanian J. Comput. Inform. Technol. (JJCIT), V3, P56
  • [2] Expertness based cooperative Q-learning
    Ahmadabadi, MN
    Asadpour, M
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2002, 32 (01): : 66 - 76
  • [3] Cooperative Q-learning: the knowledge sharing issue
    Ahmadabadi, MN
    Asadpour, M
    Nakano, E
    [J]. ADVANCED ROBOTICS, 2001, 15 (08) : 815 - 832
  • [4] [Anonymous], 1998, INTRO REINFORCEMENT
  • [5] A study on expertise of agents and its effects on cooperative Q-learning
    Araabi, Babak Nadjar
    Mastoureshgh, Sahar
    Ahmadabadi, Majid Nili
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2007, 37 (02): : 398 - 409
  • [6] Heterogeneous and hierarchical cooperative learning via combining decision trees
    Asadpour, Masoud
    Ahmadabadi, Majid Nili
    Siegwart, Roland
    [J]. 2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, : 2684 - 2690
  • [7] Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators
    Baddeley, Bart
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04): : 950 - 956
  • [8] Bradtke SJ, 1996, MACH LEARN, V22, P33, DOI 10.1007/BF00114723
  • [9] A comprehensive survey of multiagent reinforcement learning
    Busoniu, Lucian
    Babuska, Robert
    De Schutter, Bart
    [J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02): : 156 - 172
  • [10] Policy sharing between multiple mobile robots using decision trees
    Chen, Yu-Jen
    Hwang, Kao-Shing
    Jiang, Wei-Cheng
    [J]. INFORMATION SCIENCES, 2013, 234 : 112 - 120