Explicitly Learning Policy Under Partial Observability in Multiagent Reinforcement Learning

被引:1
作者
Yang, Chen [1 ,2 ]
Yang, Guangkai [1 ,2 ]
Chen, Hao [1 ]
Zhang, Junge [1 ,2 ]
机构
[1] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China
[2] Chinese Acad Sci, Inst Automat, Beijing 100049, Peoples R China
来源
2023 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, IJCNN | 2023年
基金
中国国家自然科学基金;
关键词
Multiagent reinforcement learning; partial observability; knowledge distillation;
D O I
10.1109/IJCNN54540.2023.10191476
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We explore explicit solutions for multiagent reinforcement learning (MARL) under the constraint of partial observability. With a general framework of centralized training with decentralized execution (CTDE), existing methods implicitly alleviate partial observability by introducing global information during centralized training. However, such implicit solution cannot well address partial observability and shows low sample efficiency in many MARL problems. In this paper, we focus on the influence of partial observability on the policy of agents, and formally derive an ideal form of policy that maximizes MARL objective under partial observability. Furthermore, we develop a new method named Explicitly Learning Policy (ELP), which adopts a novel teacher-student structure and utilizes knowledge distillation to explicitly learn individual policy under partial observability for each agent. Compared to prior methods, ELP presents a more general and interpretable training process, and the procedure of ELP can be easily extended to existing methods for performance boost. Our empirical experiments on StarCraft II micromanagement benchmark show that ELP significantly outperforms prevailing state-of-the-art baselines, which demonstrates the advantage of ELP in addressing partial observability and improving sample efficiency.
引用
收藏
页数:8
相关论文
共 40 条
  • [1] [Anonymous], 2017, Value-decomposition networks for cooperative multi-agent learning
  • [2] [Anonymous], MACHINE LEARNING
  • [3] [Anonymous], 2019, PROCEEDINGS OF THE A
  • [4] Ba LJ, 2014, ADV NEUR IN, V27
  • [5] Berner C., 2019, Dota 2 with large scale deep reinforcement learning
  • [6] An Overview of Recent Progress in the Study of Distributed Multi-Agent Coordination
    Cao, Yongcan
    Yu, Wenwu
    Ren, Wei
    Chen, Guanrong
    [J]. IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2013, 9 (01) : 427 - 438
  • [7] CASSANDRA AR, 1994, PROCEEDINGS OF THE TWELFTH NATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOLS 1 AND 2, P1023
  • [8] Chen G., 2019, ARXIV191009152
  • [9] Du YL, 2019, ADV NEUR IN, V32
  • [10] Foerster JN, 2019, PR MACH LEARN RES, V97