Modeling and reinforcement learning in partially observable many-agent systems

被引：0

作者：

He, Keyang ^{[1
]}

Doshi, Prashant ^{[1
]}

Banerjee, Bikramjit ^{[2
]}

机构：

[1] Univ Georgia, Sch Comp, THINC Lab, 415 Boyd Res & Educ Ctr, Athens, GA 30602 USA

[2] Univ Southern Mississippi, Sch Comp Sci & Comp Engn, 118 Coll Dr 5106, Hattiesburg, MS 39406 USA

来源：

AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS | 2024年 / 38卷 / 01期

基金：

美国国家科学基金会;

关键词：

Reinforcement learning; Multiagent system; Partial observability; Open system;

D O I：

10.1007/s10458-024-09640-1

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

There is a prevalence of multiagent reinforcement learning (MARL) methods that engage in centralized training. These methods rely on all the agents sharing various types of information, such as their actions or gradients, with a centralized trainer or each other during the learning. Subsequently, the methods produce agent policies whose prescriptions and performance are contingent on other agents engaging in behavior assumed by the centralized training. But, in many contexts, such as mixed or adversarial settings, this assumption may not be feasible. In this article, we present a new line of methods that relaxes this assumption and engages in decentralized training resulting in the agent's individual policy. The interactive advantage actor-critic (IA2C) maintains and updates beliefs over other agents' candidate behaviors based on (noisy) observations, thus enabling learning at the agent's own level. We also address MARL's prohibitive curse of dimensionality due to the presence of many agents in the system. Under assumptions of action anonymity and population homogeneity, often exhibited in practice, large numbers of other agents can be modeled aggregately by the count vectors of their actions instead of individual agent models. More importantly, we may model the distribution of these vectors and its update using the Dirichlet-multinomial model, which offers an elegant way to scale IA2C to many-agent systems. We evaluate the performance of the fully decentralized IA2C along with other known baselines on a novel Organization domain, which we introduce, and on instances of two existing domains. Experimental comparisons with prominent and recent baselines show that IA2C is more sample efficient, more robust to noise, and can scale to learning in systems with up to a hundred agents.

引用

页数：45

共 50 条

[41] The Agent Web Model: modeling web hacking for reinforcement learning
Erdodi, Laszlo
Zennaro, Fabio Massimo
INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2022, 21 (02) : 293 - 309
[42] Cooperative Learning of Multi-Agent Systems Via Reinforcement Learning
Wang, Xin
Zhao, Chen
Huang, Tingwen
Chakrabarti, Prasun
Kurths, Juergen
IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2023, 9 : 13 - 23
[43] The Agent Web Model: modeling web hacking for reinforcement learning
László Erdődi
Fabio Massimo Zennaro
International Journal of Information Security, 2022, 21 : 293 - 309
[44] Guided Soft Actor Critic: A Guided Deep Reinforcement Learning Approach for Partially Observable Markov Decision Processes
Haklidir, Mehmet
Temeltas, Hakan
IEEE ACCESS, 2021, 9 : 159672 - 159683
[45] Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments
Li, Junchao
Cai, Mingyu
Kan, Zhen
Xiao, Shaoping
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2024, 38 (01)
[46] A novel approach for self-driving car in partially observable environment using life long reinforcement learning
Quadir, Md Abdul
Jaiswal, Dibyanshu
Mohan, Senthilkumar
Innab, Nisreen
Sulaiman, Riza
Alaoui, Mohammed Kbiri
Ahmadian, Ali
SUSTAINABLE ENERGY GRIDS & NETWORKS, 2024, 38
[47] A stabilizing reinforcement learning approach for sampled systems with partially unknown models
Beckenbach, Lukas
Osinenko, Pavel
Streif, Stefan
INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (18) : 12389 - 12412
[48] Shaping multi-agent systems with gradient reinforcement learning
Buffet, Olivier
Dutech, Alain
Charpillet, Francois
AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2007, 15 (02) : 197 - 220
[49] Disturbance Observable Reinforcement Learning that Compensates for Changes in Environment
Kim, SeongIn
Shibuya, Takeshi
2022 61ST ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS (SICE), 2022, : 141 - 145
[50] Shaping multi-agent systems with gradient reinforcement learning
Olivier Buffet
Alain Dutech
François Charpillet
Autonomous Agents and Multi-Agent Systems, 2007, 15 : 197 - 220

← 1 2 3 4 5 →