Policy Sharing Using Aggregation Trees for Q-Learning in a Continuous State and Action Spaces

被引：4

作者：

Chen, Yu-Jen ^{[1
]}

Jiang, Wei-Cheng ^{[2
]}

Ju, Ming-Yi ^{[3
]}

Hwang, Kao-Shing ^{[4
,5
]}

机构：

[1] Natl Chung Cheng Univ, Dept Elect Engn, Chiayi 62102, Taiwan

[2] Tunghai Univ, Dept Elect Engn, Taichung 40704, Taiwan

[3] Natl Univ Tainan, Dept Comp Sci & Informat Engn, Tainan 70005, Taiwan

[4] Natl Sun Yat Sen Univ, Dept Elect Engn, Kaohsiung 80424, Taiwan

[5] Kaohsiung Med Univ, Dept Healthcare Adm & Med Informat, Kaohsiung 80708, Taiwan

来源：

IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS | 2020年 / 12卷 / 03期

关键词：

Multiagent system; policy sharing; Q-learning; tree structure;

D O I：

10.1109/TCDS.2019.2926477

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Q-learning is a generic approach that uses a finite discrete state and an action domain to estimate action values using tabular or function approximation methods. An intelligent agent eventually learns policies from continuous sensory inputs and encodes these environmental inputs onto a discrete state space. The application of Q-learning in a continuous state/action domain is the subject of many studies. This paper uses a tree structure to approximate a Q-function using in a continuous state domain. The agent selects a discretized action with a maximum Q-value and this discretized action is then extended to a continuous action using an action bias function. Reinforcement learning is difficult for a single agent when the state space is huge. This proposed architecture is also applied to a multiagent system, wherein an individual agent transfers its useful Q-values to other agents to accelerate the learning process. Policy is shared between agents by grafting the branches of trees in which Q-values are stored to other trees. The results for simulation show that the proposed architecture performs better than tabular Q-learning and significantly accelerates the learning process because all agents use the sharing mechanisms to cooperate with each other.

引用

页码：474 / 485

页数：12

共 32 条

[1] Abed-alguni BH., 2017, Jordanian J. Comput. Inform. Technol. (JJCIT), V3, P56
[2] Expertness based cooperative Q-learning
Ahmadabadi, MN
Asadpour, M
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2002, 32 (01): : 66 - 76
[3] Cooperative Q-learning: the knowledge sharing issue
Ahmadabadi, MN
Asadpour, M
Nakano, E
[J]. ADVANCED ROBOTICS, 2001, 15 (08) : 815 - 832
[4] [Anonymous], 1998, INTRO REINFORCEMENT
[5] A study on expertise of agents and its effects on cooperative Q-learning
Araabi, Babak Nadjar
Mastoureshgh, Sahar
Ahmadabadi, Majid Nili
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2007, 37 (02): : 398 - 409
[6] Heterogeneous and hierarchical cooperative learning via combining decision trees
Asadpour, Masoud
Ahmadabadi, Majid Nili
Siegwart, Roland
[J]. 2006 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS, VOLS 1-12, 2006, : 2684 - 2690
[7] Reinforcement learning in continuous time and space: Interference and not ill conditioning is the main problem when using distributed function approximators
Baddeley, Bart
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART B-CYBERNETICS, 2008, 38 (04): : 950 - 956
[8] Bradtke SJ, 1996, MACH LEARN, V22, P33, DOI 10.1007/BF00114723
[9] A comprehensive survey of multiagent reinforcement learning
Busoniu, Lucian
Babuska, Robert
De Schutter, Bart
[J]. IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02): : 156 - 172
[10] Policy sharing between multiple mobile robots using decision trees
Chen, Yu-Jen
Hwang, Kao-Shing
Jiang, Wei-Cheng
[J]. INFORMATION SCIENCES, 2013, 234 : 112 - 120

← 1 2 3 4 →