Modeling and reinforcement learning in partially observable many-agent systems

被引:0
|
作者
He, Keyang [1 ]
Doshi, Prashant [1 ]
Banerjee, Bikramjit [2 ]
机构
[1] Univ Georgia, Sch Comp, THINC Lab, 415 Boyd Res & Educ Ctr, Athens, GA 30602 USA
[2] Univ Southern Mississippi, Sch Comp Sci & Comp Engn, 118 Coll Dr 5106, Hattiesburg, MS 39406 USA
基金
美国国家科学基金会;
关键词
Reinforcement learning; Multiagent system; Partial observability; Open system;
D O I
10.1007/s10458-024-09640-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
There is a prevalence of multiagent reinforcement learning (MARL) methods that engage in centralized training. These methods rely on all the agents sharing various types of information, such as their actions or gradients, with a centralized trainer or each other during the learning. Subsequently, the methods produce agent policies whose prescriptions and performance are contingent on other agents engaging in behavior assumed by the centralized training. But, in many contexts, such as mixed or adversarial settings, this assumption may not be feasible. In this article, we present a new line of methods that relaxes this assumption and engages in decentralized training resulting in the agent's individual policy. The interactive advantage actor-critic (IA2C) maintains and updates beliefs over other agents' candidate behaviors based on (noisy) observations, thus enabling learning at the agent's own level. We also address MARL's prohibitive curse of dimensionality due to the presence of many agents in the system. Under assumptions of action anonymity and population homogeneity, often exhibited in practice, large numbers of other agents can be modeled aggregately by the count vectors of their actions instead of individual agent models. More importantly, we may model the distribution of these vectors and its update using the Dirichlet-multinomial model, which offers an elegant way to scale IA2C to many-agent systems. We evaluate the performance of the fully decentralized IA2C along with other known baselines on a novel Organization domain, which we introduce, and on instances of two existing domains. Experimental comparisons with prominent and recent baselines show that IA2C is more sample efficient, more robust to noise, and can scale to learning in systems with up to a hundred agents.
引用
收藏
页数:45
相关论文
共 50 条
  • [21] Partially observable environment estimation with uplift inference for reinforcement learning based recommendation
    Wenjie Shang
    Qingyang Li
    Zhiwei Qin
    Yang Yu
    Yiping Meng
    Jieping Ye
    Machine Learning, 2021, 110 : 2603 - 2640
  • [22] Adaptive Compensation for Robotic Joint Failures Using Partially Observable Reinforcement Learning
    Pham, Tan-Hanh
    Aikins, Godwyll
    Truong, Tri
    Nguyen, Kim-Doang
    ALGORITHMS, 2024, 17 (10)
  • [23] A projective simulation scheme for partially observable multi-agent systems
    Rasoul Kheiri
    Quantum Machine Intelligence, 2021, 3
  • [25] Abstraction in Model Based Partially Observable Reinforcement Learning using Extended Sequence Trees
    Cilden, Erkin
    Polat, Faruk
    2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 2, 2012, : 348 - 355
  • [26] A Reinforcement Learning Approach for Solving the Mean Variance Customer Portfolio in Partially Observable Models
    Asiain, Erick
    Clempner, Julio B.
    Poznyak, Alexander S.
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2018, 27 (08)
  • [27] Hierarchical Deep Reinforcement Learning for Multi-robot Cooperation in Partially Observable Environment
    Liang, Zhixuan
    Cao, Jiannong
    Lin, Wanyu
    Chen, Jinlin
    Xu, Huafeng
    2021 IEEE THIRD INTERNATIONAL CONFERENCE ON COGNITIVE MACHINE INTELLIGENCE (COGMI 2021), 2021, : 272 - 281
  • [28] Deep Recurrent Reinforcement Learning for Partially Observable User Association in a Vertical Heterogenous Network
    Khoshkbari, Hesam
    Kaddoum, Georges
    IEEE COMMUNICATIONS LETTERS, 2023, 27 (12) : 3235 - 3239
  • [29] Benefits of Combining Dimensional Attention and Working Memory for Partially Observable Reinforcement Learning Problems
    Omatu, Ngozi
    Phillips, Joshua L.
    ACMSE 2021: PROCEEDINGS OF THE 2021 ACM SOUTHEAST CONFERENCE, 2021, : 209 - 213
  • [30] Unsupervised Modeling of Partially Observable Environments
    Graziano, Vincent
    Koutnik, Jan
    Schmidhuber, Juergen
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, PT I, 2011, 6911 : 503 - 515