Modeling and reinforcement learning in partially observable many-agent systems

被引:0
|
作者
He, Keyang [1 ]
Doshi, Prashant [1 ]
Banerjee, Bikramjit [2 ]
机构
[1] Univ Georgia, Sch Comp, THINC Lab, 415 Boyd Res & Educ Ctr, Athens, GA 30602 USA
[2] Univ Southern Mississippi, Sch Comp Sci & Comp Engn, 118 Coll Dr 5106, Hattiesburg, MS 39406 USA
基金
美国国家科学基金会;
关键词
Reinforcement learning; Multiagent system; Partial observability; Open system;
D O I
10.1007/s10458-024-09640-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There is a prevalence of multiagent reinforcement learning (MARL) methods that engage in centralized training. These methods rely on all the agents sharing various types of information, such as their actions or gradients, with a centralized trainer or each other during the learning. Subsequently, the methods produce agent policies whose prescriptions and performance are contingent on other agents engaging in behavior assumed by the centralized training. But, in many contexts, such as mixed or adversarial settings, this assumption may not be feasible. In this article, we present a new line of methods that relaxes this assumption and engages in decentralized training resulting in the agent's individual policy. The interactive advantage actor-critic (IA2C) maintains and updates beliefs over other agents' candidate behaviors based on (noisy) observations, thus enabling learning at the agent's own level. We also address MARL's prohibitive curse of dimensionality due to the presence of many agents in the system. Under assumptions of action anonymity and population homogeneity, often exhibited in practice, large numbers of other agents can be modeled aggregately by the count vectors of their actions instead of individual agent models. More importantly, we may model the distribution of these vectors and its update using the Dirichlet-multinomial model, which offers an elegant way to scale IA2C to many-agent systems. We evaluate the performance of the fully decentralized IA2C along with other known baselines on a novel Organization domain, which we introduce, and on instances of two existing domains. Experimental comparisons with prominent and recent baselines show that IA2C is more sample efficient, more robust to noise, and can scale to learning in systems with up to a hundred agents.
引用
收藏
页数:45
相关论文
共 50 条
  • [31] Learning to Act Optimally in Partially Observable Multiagent Settings
    Ceren, Roi
    AAMAS'16: PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2016, : 1532 - 1533
  • [32] Formal Modeling of Reinforcement Learning with Many Agents through Repeated Local Interactions
    Leung, Chin-Wing
    Hu, Shuyue
    Leung, Ho-Fung
    2021 IEEE 33RD INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2021), 2021, : 714 - 718
  • [33] A gradient-based reinforcement learning approach to dynamic pricing in partially-observable environments
    Vengerov, David
    FUTURE GENERATION COMPUTER SYSTEMS-THE INTERNATIONAL JOURNAL OF ESCIENCE, 2008, 24 (07): : 687 - 693
  • [34] Coping with Bad Agent Interaction Protocols When Monitoring Partially Observable Multiagent Systems
    Ancona, Davide
    Ferrando, Angelo
    Franceschini, Luca
    Mascardi, Viviana
    ADVANCES IN PRACTICAL APPLICATIONS OF AGENTS, MULTI-AGENT SYSTEMS, AND COMPLEXITY: THE PAAMS COLLECTION, 2018, 10978 : 59 - 71
  • [35] Learning-based line impedance estimation for partially observable distribution systems
    Zhu, Yanming
    Xu, Xiaoyuan
    Yan, Zheng
    INTERNATIONAL JOURNAL OF ELECTRICAL POWER & ENERGY SYSTEMS, 2022, 137
  • [36] Memory-driven deep-reinforcement learning for autonomous robot navigation in partially observable environments
    Montero, Estrella
    Pico, Nabih
    Ghergherehchi, Mitra
    Song, Ho Seung
    ENGINEERING SCIENCE AND TECHNOLOGY-AN INTERNATIONAL JOURNAL-JESTECH, 2025, 62
  • [37] Learning partially observable deterministic action models
    Amir, Eyal
    Chang, Allen
    Journal of Artificial Intelligence Research, 2008, 33 : 349 - 402
  • [38] Learning a Transferable World Model by Reinforcement Agent in Deterministic Observable Grid-World Environments
    Kapociute-Dzikiene, Jurgita
    Raskinis, Gailius
    INFORMATION TECHNOLOGY AND CONTROL, 2012, 41 (04): : 318 - 327
  • [39] Modeling Partially Observable Systems using Graph-Based Memory and Topological Priors
    Morad, Steven D.
    Liwicki, Stephan
    Korvelesy, Ryan
    Mecca, Roberto
    Prorok, Amanda
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 168, 2022, 168
  • [40] Commander-Soldiers Reinforcement Learning for Cooperative Multi-Agent Systems
    Chen, Yiqun
    Yang, Wei
    Zhang, Tianle
    Wu, Shiguang
    Chang, Hongxing
    2022 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2022,