Modeling and reinforcement learning in partially observable many-agent systems

被引：0

作者：

He, Keyang ^{[1
]}

Doshi, Prashant ^{[1
]}

Banerjee, Bikramjit ^{[2
]}

机构：

[1] Univ Georgia, Sch Comp, THINC Lab, 415 Boyd Res & Educ Ctr, Athens, GA 30602 USA

[2] Univ Southern Mississippi, Sch Comp Sci & Comp Engn, 118 Coll Dr 5106, Hattiesburg, MS 39406 USA

来源：

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS | 2024年 / 38卷 / 01期

基金：

美国国家科学基金会;

关键词：

Reinforcement learning; Multiagent system; Partial observability; Open system;

D O I：

10.1007/s10458-024-09640-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

There is a prevalence of multiagent reinforcement learning (MARL) methods that engage in centralized training. These methods rely on all the agents sharing various types of information, such as their actions or gradients, with a centralized trainer or each other during the learning. Subsequently, the methods produce agent policies whose prescriptions and performance are contingent on other agents engaging in behavior assumed by the centralized training. But, in many contexts, such as mixed or adversarial settings, this assumption may not be feasible. In this article, we present a new line of methods that relaxes this assumption and engages in decentralized training resulting in the agent's individual policy. The interactive advantage actor-critic (IA2C) maintains and updates beliefs over other agents' candidate behaviors based on (noisy) observations, thus enabling learning at the agent's own level. We also address MARL's prohibitive curse of dimensionality due to the presence of many agents in the system. Under assumptions of action anonymity and population homogeneity, often exhibited in practice, large numbers of other agents can be modeled aggregately by the count vectors of their actions instead of individual agent models. More importantly, we may model the distribution of these vectors and its update using the Dirichlet-multinomial model, which offers an elegant way to scale IA2C to many-agent systems. We evaluate the performance of the fully decentralized IA2C along with other known baselines on a novel Organization domain, which we introduce, and on instances of two existing domains. Experimental comparisons with prominent and recent baselines show that IA2C is more sample efficient, more robust to noise, and can scale to learning in systems with up to a hundred agents.

引用

页数：45

共 50 条

[31] Learning to Act Optimally in Partially Observable Multiagent Settings
Ceren, Roi
AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1532 - 1533
[32] Formal Modeling of Reinforcement Learning with Many Agents through Repeated Local Interactions
Leung, Chin-Wing
Hu, Shuyue
Leung, Ho-Fung
2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 714 - 718
[33] A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments
Vengerov, David
FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2008, 24 (07): : 687 - 693
[34] Coping with Bad Agent Interaction Protocols When Monitoring Partially Observable Multiagent Systems
Ancona, Davide
Ferrando, Angelo
Franceschini, Luca
Mascardi, Viviana
ADVANCES IN PRACTICAL APPLICATIONS OF AGENTS, MULTI-AGENT SYSTEMS, AND COMPLEXITY: THE PAAMS COLLECTION, 2018, 10978 : 59 - 71
[35] Learning-based line impedance estimation for partially observable distribution systems
Zhu, Yanming
Xu, Xiaoyuan
Yan, Zheng
INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2022, 137
[36] Memory-driven deep-reinforcement learning for autonomous robot navigation in partially observable environments
Montero, Estrella
Pico, Nabih
Ghergherehchi, Mitra
Song, Ho Seung
ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2025, 62
[37] Learning partially observable deterministic action models
Amir, Eyal
Chang, Allen
Journal of Artificial Intelligence Research, 2008, 33 : 349 - 402
[38] Learning a Transferable World Model by Reinforcement Agent in Deterministic Observable Grid-World Environments
Kapociute-Dzikiene, Jurgita
Raskinis, Gailius
INFORMATION TECHNOLOGY AND CONTROL, 2012, 41 (04): : 318 - 327
[39] Modeling Partially Observable Systems using Graph-Based Memory and Topological Priors
Morad, Steven D.
Liwicki, Stephan
Korvelesy, Ryan
Mecca, Roberto
Prorok, Amanda
LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 168, 2022, 168
[40] Commander-Soldiers Reinforcement Learning for Cooperative Multi-Agent Systems
Chen, Yiqun
Yang, Wei
Zhang, Tianle
Wu, Shiguang
Chang, Hongxing
2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,

← 1 2 3 4 5 →