On Centralized Critics in Multi-Agent Reinforcement Learning

被引：0

作者：

Lyu, Xueguang ^{[1
]}

Baisero, Andrea ^{[1
]}

Xiao, Yuchen ^{[1
]}

Daley, Brett ^{[1
]}

Amato, Christopher ^{[1
]}

机构：

[1] Northeastern Univ, Khoury Coll Comp Sci, 360 Huntington Ave, Boston, MA 02115 USA

来源：

JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH | 2023年 / 77卷

关键词：

LEARNERS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Centralized Training for Decentralized Execution, where agents are trained offline in a centralized fashion and execute online in a decentralized manner, has become a popular approach in Multi-Agent Reinforcement Learning (MARL). In particular, it has become popular to develop actor-critic methods that train decentralized actors with a centralized critic where the centralized critic is allowed access global information of the entire system, including the true system state. Such centralized critics are possible given offline information and are not used for online execution. While these methods perform well in a number of domains and have become a de facto standard in MARL, using a centralized critic in this context has yet to be sufficiently analyzed theoretically or empirically. In this paper, we therefore formally analyze centralized and decentralized critic approaches, and analyze the effect of using state-based critics in partially observable environments. We derive theories contrary to the common intuition: critic centralization is not strictly beneficial, and using state values can be harmful. We further prove that, in particular, state-based critics can introduce unexpected bias and variance compared to history-based critics. Finally, we demonstrate how the theory applies in practice by comparing different forms of critics on a wide range of common multi-agent benchmarks. The experiments show practical issues such as the difficulty of representation learning with partial observability, which highlights why the theoretical problems are often overlooked in the literature.

引用

页码：295 / 354

页数：60

共 58 条

[1] Amato C., 2007, P C UNC ART INT
[2] Amato C., 2009, P INT C AUT PLANN SC, P2
[3] [Anonymous], 2000, P C UNCERTAINTY ARTI
[4] Baisero A., 2022, P INT C AUTONOMOUS A
[5] Baker B., 2020, INT C LEARNING REPRE
[6] Bernstein DS, 2005, 19TH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-05), P1287
[7] Cooperative Multi-agent Policy Gradient
Bono, Guillaume
Dibangoye, Jilles Steeve
Matignon, Laetitia
Pereyron, Florian
Simonin, Olivier
[J]. MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2018, PT I, 2019, 11051 : 459 - 476
[8] Bowling M. H., 2001, P 18 INT C MACH LEAR, P27
[9] Chakravorty J., 2020, P 19 INT C AUT AG MU, P1792
[10] Claus C, 1998, FIFTEENTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE (AAAI-98) AND TENTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICAL INTELLIGENCE (IAAI-98) - PROCEEDINGS, P746

← 1 2 3 4 5 6 →