Hierarchical multi-agent reinforcement learning

被引:1
作者
Mohammad Ghavamzadeh
Sridhar Mahadevan
Rajbala Makar
机构
[1] University of Alberta,Department of Computing Science
[2] University of Massachusetts Amherst,Department of Computer Science
[3] Agilent Technologies,undefined
来源
Autonomous Agents and Multi-Agent Systems | 2006年 / 13卷
关键词
Hierarchical reinforcement learning; Cooperative multi-agent systems; Coordination; Communication;
D O I
暂无
中图分类号
学科分类号
摘要
In this paper, we investigate the use of hierarchical reinforcement learning (HRL) to speed up the acquisition of cooperative multi-agent tasks. We introduce a hierarchical multi-agent reinforcement learning (RL) framework, and propose a hierarchical multi-agent RL algorithm called Cooperative HRL. In this framework, agents are cooperative and homogeneous (use the same task decomposition). Learning is decentralized, with each agent learning three interrelated skills: how to perform each individual subtask, the order in which to carry them out, and how to coordinate with other agents. We define cooperative subtasks to be those subtasks in which coordination among agents significantly improves the performance of the overall task. Those levels of the hierarchy which include cooperative subtasks are called cooperation levels. A fundamental property of the proposed approach is that it allows agents to learn coordination faster by sharing information at the level of cooperative subtasks, rather than attempting to learn coordination at the level of primitive actions. We study the empirical performance of the Cooperative HRL algorithm using two testbeds: a simulated two-robot trash collection task, and a larger four-agent automated guided vehicle (AGV) scheduling problem. We compare the performance and speed of Cooperative HRL with other learning algorithms, as well as several well-known industrial AGV heuristics. We also address the issue of rational communication behavior among autonomous agents in this paper. The goal is for agents to learn both action and communication policies that together optimize the task given a communication cost. We extend the multi-agent HRL framework to include communication decisions and propose a cooperative multi-agent HRL algorithm called COM-Cooperative HRL. In this algorithm, we add a communication level to the hierarchical decomposition of the problem below each cooperation level. Before an agent makes a decision at a cooperative subtask, it decides if it is worthwhile to perform a communication action. A communication action has a certain cost and provides the agent with the actions selected by the other agents at a cooperation level. We demonstrate the efficiency of the COM-Cooperative HRL algorithm as well as the relation between the communication cost and the learned communication policy using a multi-agent taxi problem.
引用
收藏
页码:197 / 229
页数:32
相关论文
共 21 条
[1]  
Balch T.(1998)Behavior-based formation control for multi-robot Teams IEEE Transactions on Robotics and Automation 14 1-15
[2]  
Arkin R.(2003)Recent advances in hierarchical reinforcement learning Discrete Event Systems Special Issue on Reinforcement Learning 13 41-77
[3]  
Barto A.(2002)Multiagent learning using a variable learning rate Artificial Intelligence 136 215-250
[4]  
Mahadevan S.(2002)Policy recognition in the abstract hidden markov model Journal of Artificial Intelligence Research 17 451-499
[5]  
Bowling M.(1998)Elevator group control using multiple reinforcement learning agents Machine Learning 33 235-262
[6]  
Veloso M.(2000)Hierarchical reinforcement learning with the MAXQ value function decomposition Journal of Artificial Intelligence Research 13 227-303
[7]  
Bui H.(1996)AGV dispatching International Journal of Production Research 34 95-110
[8]  
Venkatesh S.(1996)Composite dispatching rules for multiple-vehicle agv systems Simulation 66 121-130
[9]  
West G.(1997)Reinforcement learning in the multi-robot domain (1997) Autonomous Robots 4 73-83
[10]  
Crites R.(2002)The communicative multiagent team decision problem: Analyzing teamwork theories and models Journal of Artificial Intelligence Research 16 389-426