Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks

被引:0
作者
Takumi Aotani
Taisuke Kobayashi
Kenji Sugimoto
机构
[1] Nara Institute of Science and Technology,Division of Information Science
来源
Applied Intelligence | 2021年 / 51卷
关键词
Distributed autonomous system; Reinforcement learning; Reward shaping; Interests between agents;
D O I
暂无
中图分类号
学科分类号
摘要
A multi-agent system (MAS) is expected to be applied to various real-world problems where a single agent cannot accomplish given tasks. Due to the inherent complexity in the real-world MAS, however, manual design of group behaviors of agents is intractable. Multi-agent reinforcement learning (MARL), which is a framework for multiple agents in the same environment to learn their policies adaptively by using reinforcement learning, would be a promising methodology for such complexity in the MAS. To acquire the group behaviors by MARL, all the agents are required to understand how to achieve the respective tasks cooperatively. So far, we have proposed “bottom-up MARL”, which is a decentralized system to manage real and large-scale MARL, with a reward shaping algorithm to represent the group behaviors. The reward shaping algorithm, however, assumes that all the agents are in cooperative relationships to some extent. In this paper, therefore, we extend this algorithm to allow the agents not to know the interests between them. The interests are regarded as correlation coefficients derived from the agents’ rewards, which are numerically estimated in an online manner. Actually, in both simulations and real experiments without knowledge of the interests between the agents, they correctly estimated their interests, thereby allowing them to derive their new rewards to represent the feasible group behaviors in the decentralized manner. As a result, our extended algorithm succeeded in acquiring the group behaviors from cooperative tasks to competitive tasks.
引用
收藏
页码:4434 / 4452
页数:18
相关论文
共 47 条
  • [1] Bai H(2010)Cooperative load transport: A formation-control perspective IEEE Trans Robot 26 742-750
  • [2] Wen JT(2010)Small satellites for global coverage: Potential and limits ISPRS Journal of photogrammetry and Remote Sensing 65 492-504
  • [3] Sandau R(2013)Coordinating heterogeneous teams of robots using temporal symbolic planning Auton Robot 34 277-294
  • [4] Brieß K(1997)Reinforcement learning in the multi-robot domain Auton Robot 4 73-83
  • [5] D’Errico M(2015)Human-level control through deep reinforcement learning Nature 518 529-864
  • [6] Wurm KM(2016)Reinforcement learning with multiple shared rewards Procedia Computer Science 80 855-338
  • [7] Dornhege C(2008)Analyzing and visualizing multiagent rewards in dynamic and stochastic domains Auton Agent Multi-Agent Syst 17 320-256
  • [8] Nebel B(2017)Multiagent cooperation and competition with deep reinforcement learning PloS one 12 e0172395-40
  • [9] Burgard W(1992)Simple statistical gradient-following algorithms for connectionist reinforcement learning Machine learning 8 229-4347
  • [10] Stachniss C(2016)True online temporal-difference learning J Mach Learn Res 17 1-188