Modeling and reinforcement learning in partially observable many-agent systems

被引：0

作者：

He, Keyang ^{[1
]}

Doshi, Prashant ^{[1
]}

Banerjee, Bikramjit ^{[2
]}

机构：

[1] Univ Georgia, Sch Comp, THINC Lab, 415 Boyd Res & Educ Ctr, Athens, GA 30602 USA

[2] Univ Southern Mississippi, Sch Comp Sci & Comp Engn, 118 Coll Dr 5106, Hattiesburg, MS 39406 USA

来源：

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS | 2024年 / 38卷 / 01期

基金：

美国国家科学基金会;

关键词：

Reinforcement learning; Multiagent system; Partial observability; Open system;

D O I：

10.1007/s10458-024-09640-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

There is a prevalence of multiagent reinforcement learning (MARL) methods that engage in centralized training. These methods rely on all the agents sharing various types of information, such as their actions or gradients, with a centralized trainer or each other during the learning. Subsequently, the methods produce agent policies whose prescriptions and performance are contingent on other agents engaging in behavior assumed by the centralized training. But, in many contexts, such as mixed or adversarial settings, this assumption may not be feasible. In this article, we present a new line of methods that relaxes this assumption and engages in decentralized training resulting in the agent's individual policy. The interactive advantage actor-critic (IA2C) maintains and updates beliefs over other agents' candidate behaviors based on (noisy) observations, thus enabling learning at the agent's own level. We also address MARL's prohibitive curse of dimensionality due to the presence of many agents in the system. Under assumptions of action anonymity and population homogeneity, often exhibited in practice, large numbers of other agents can be modeled aggregately by the count vectors of their actions instead of individual agent models. More importantly, we may model the distribution of these vectors and its update using the Dirichlet-multinomial model, which offers an elegant way to scale IA2C to many-agent systems. We evaluate the performance of the fully decentralized IA2C along with other known baselines on a novel Organization domain, which we introduce, and on instances of two existing domains. Experimental comparisons with prominent and recent baselines show that IA2C is more sample efficient, more robust to noise, and can scale to learning in systems with up to a hundred agents.

引用

页数：45

共 50 条

[21] Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
Wenjie Shang
Qingyang Li
Zhiwei Qin
Yang Yu
Yiping Meng
Jieping Ye
Machine Learning, 2021, 110 : 2603 - 2640
[22] Adaptive Compensation for Robotic Joint Failures Using Partially Observable Reinforcement Learning
Pham, Tan-Hanh
Aikins, Godwyll
Truong, Tri
Nguyen, Kim-Doang
ALGORITHMS, 2024, 17 (10)
[23] A projective simulation scheme for partially observable multi-agent systems
Rasoul Kheiri
Quantum Machine Intelligence, 2021, 3
[24] A projective simulation scheme for partially observable multi-agent systems
Kheiri, Rasoul
QUANTUM MACHINE INTELLIGENCE, 2021, 3 (01)
[25] Abstraction in Model Based Partially Observable Reinforcement Learning using Extended Sequence Trees
Cilden, Erkin
Polat, Faruk
2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 2, 2012, : 348 - 355
[26] A Reinforcement Learning Approach for Solving the Mean Variance Customer Portfolio in Partially Observable Models
Asiain, Erick
Clempner, Julio B.
Poznyak, Alexander S.
INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2018, 27 (08)
[27] Hierarchical Deep Reinforcement Learning for Multi-robot Cooperation in Partially Observable Environment
Liang, Zhixuan
Cao, Jiannong
Lin, Wanyu
Chen, Jinlin
Xu, Huafeng
2021 IEEE THIRD INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE (COGMI 2021), 2021, : 272 - 281
[28] Deep Recurrent Reinforcement Learning for Partially Observable User Association in a Vertical Heterogenous Network
Khoshkbari, Hesam
Kaddoum, Georges
IEEE COMMUNICATIONS LETTERS, 2023, 27 (12) : 3235 - 3239
[29] Benefits of Combining Dimensional Attention and Working Memory for Partially Observable Reinforcement Learning Problems
Omatu, Ngozi
Phillips, Joshua L.
ACMSE 2021: PROCEEDINGS OF THE 2021 ACM SOUTHEAST CONFERENCE, 2021, : 209 - 213
[30] Unsupervised Modeling of Partially Observable Environments
Graziano, Vincent
Koutnik, Jan
Schmidhuber, Juergen
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2011, 6911 : 503 - 515

← 1 2 3 4 5 →