Modeling and reinforcement learning in partially observable many-agent systems

被引:0
|
作者
He, Keyang [1 ]
Doshi, Prashant [1 ]
Banerjee, Bikramjit [2 ]
机构
[1] Univ Georgia, Sch Comp, THINC Lab, 415 Boyd Res & Educ Ctr, Athens, GA 30602 USA
[2] Univ Southern Mississippi, Sch Comp Sci & Comp Engn, 118 Coll Dr 5106, Hattiesburg, MS 39406 USA
基金
美国国家科学基金会;
关键词
Reinforcement learning; Multiagent system; Partial observability; Open system;
D O I
10.1007/s10458-024-09640-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There is a prevalence of multiagent reinforcement learning (MARL) methods that engage in centralized training. These methods rely on all the agents sharing various types of information, such as their actions or gradients, with a centralized trainer or each other during the learning. Subsequently, the methods produce agent policies whose prescriptions and performance are contingent on other agents engaging in behavior assumed by the centralized training. But, in many contexts, such as mixed or adversarial settings, this assumption may not be feasible. In this article, we present a new line of methods that relaxes this assumption and engages in decentralized training resulting in the agent's individual policy. The interactive advantage actor-critic (IA2C) maintains and updates beliefs over other agents' candidate behaviors based on (noisy) observations, thus enabling learning at the agent's own level. We also address MARL's prohibitive curse of dimensionality due to the presence of many agents in the system. Under assumptions of action anonymity and population homogeneity, often exhibited in practice, large numbers of other agents can be modeled aggregately by the count vectors of their actions instead of individual agent models. More importantly, we may model the distribution of these vectors and its update using the Dirichlet-multinomial model, which offers an elegant way to scale IA2C to many-agent systems. We evaluate the performance of the fully decentralized IA2C along with other known baselines on a novel Organization domain, which we introduce, and on instances of two existing domains. Experimental comparisons with prominent and recent baselines show that IA2C is more sample efficient, more robust to noise, and can scale to learning in systems with up to a hundred agents.
引用
收藏
页数:45
相关论文
共 50 条
  • [41] The Agent Web Model: modeling web hacking for reinforcement learning
    Erdodi, Laszlo
    Zennaro, Fabio Massimo
    INTERNATIONAL JOURNAL OF INFORMATION SECURITY, 2022, 21 (02) : 293 - 309
  • [42] Cooperative Learning of Multi-Agent Systems Via Reinforcement Learning
    Wang, Xin
    Zhao, Chen
    Huang, Tingwen
    Chakrabarti, Prasun
    Kurths, Juergen
    IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2023, 9 : 13 - 23
  • [43] The Agent Web Model: modeling web hacking for reinforcement learning
    László Erdődi
    Fabio Massimo Zennaro
    International Journal of Information Security, 2022, 21 : 293 - 309
  • [44] Guided Soft Actor Critic: A Guided Deep Reinforcement Learning Approach for Partially Observable Markov Decision Processes
    Haklidir, Mehmet
    Temeltas, Hakan
    IEEE ACCESS, 2021, 9 : 159672 - 159683
  • [45] Model-free reinforcement learning for motion planning of autonomous agents with complex tasks in partially observable environments
    Li, Junchao
    Cai, Mingyu
    Kan, Zhen
    Xiao, Shaoping
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2024, 38 (01)
  • [46] A novel approach for self-driving car in partially observable environment using life long reinforcement learning
    Quadir, Md Abdul
    Jaiswal, Dibyanshu
    Mohan, Senthilkumar
    Innab, Nisreen
    Sulaiman, Riza
    Alaoui, Mohammed Kbiri
    Ahmadian, Ali
    SUSTAINABLE ENERGY GRIDS & NETWORKS, 2024, 38
  • [47] A stabilizing reinforcement learning approach for sampled systems with partially unknown models
    Beckenbach, Lukas
    Osinenko, Pavel
    Streif, Stefan
    INTERNATIONAL JOURNAL OF ROBUST AND NONLINEAR CONTROL, 2024, 34 (18) : 12389 - 12412
  • [48] Shaping multi-agent systems with gradient reinforcement learning
    Buffet, Olivier
    Dutech, Alain
    Charpillet, Francois
    AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2007, 15 (02) : 197 - 220
  • [49] Disturbance Observable Reinforcement Learning that Compensates for Changes in Environment
    Kim, SeongIn
    Shibuya, Takeshi
    2022 61ST ANNUAL CONFERENCE OF THE SOCIETY OF INSTRUMENT AND CONTROL ENGINEERS (SICE), 2022, : 141 - 145
  • [50] Shaping multi-agent systems with gradient reinforcement learning
    Olivier Buffet
    Alain Dutech
    François Charpillet
    Autonomous Agents and Multi-Agent Systems, 2007, 15 : 197 - 220