Preference-based experience sharing scheme for multi-agent reinforcement learning in multi-target environments

被引：1

作者：

Zuo, Xuan ^{[1
]}

Zhang, Pu ^{[1
,2
]}

Li, Hui-Yan ^{[3
]}

Liu, Zhun-Ga ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Automat, West Youyi Rd, Xian 710072, Shaanxi, Peoples R China

[2] Xian Univ Technol, Sch Automat & Informat Engn, Jinhua Rd, Xian 710048, Shaanxi, Peoples R China

[3] China Aerosp Acad Syst Sci & Engn, Fucheng Rd, Beijing 100048, Peoples R China

来源：

EVOLVING SYSTEMS | 2024年 / 15卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Experience sharing; Multi-agent; Target assignment; Reinforcement learning; Experience replay; UAV; ASSIGNMENT;

D O I：

10.1007/s12530-024-09587-4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-agent reinforcement learning is a varied and highly active field of research. The idea of parameter sharing or experience sharing has recently been introduced into multi-agent reinforcement learning to accelerate the training of multiple neural networks and enhance the final returns. However, implementing the parameter or experience sharing methods in multi-agent environments could introduce additional constraint or computation cost. This work presents a preference-based experience sharing scheme, which allows for different policies in environments with weakly homogeneous agents and requires barely any additional computational power. In this scheme, the experience replay buffer is augmented by adding a choice vector which indicates the preferred target of the agent, and each agent can learn various policies from the experience data collected by other agents who choose the same target. PSE-MADDPG, an off-policy algorithm with the preference-based experience sharing scheme, is proposed and benchmarked on a multi-target assignment and cooperative navigation mission. Experimental results show that PSE-MADDPG can successfully solve the problem of multiple targets assignment and outperform two classical deep reinforcement learning algorithms by learning in fewer steps and converging to higher episode rewards. Meanwhile, PSE-MADDPG relaxes the constraint of the strongly homogeneous agents assumption and requires little additional computation cost.

引用

页码：1681 / 1699

页数：19

共 60 条

[1] Exact and heuristic algorithms for the weapon-target assignment problem [J].

Ahuja, Ravindra K. ;

Kumar, Arvind ;

Jha, Krishna C. ;

Orlin, James B. .

OPERATIONS RESEARCH, 2007, 55 (06) :1136-1146

[2]

Albrecht S. V., 2023, Multi-agent reinforcement learning: Foundations and modern approaches

[3] Deep Reinforcement Learning A brief survey [J].

Arulkumaran, Kai ;

Deisenroth, Marc Peter ;

Brundage, Miles ;

Bharath, Anil Anthony .

IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38

[4] NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].

BARTO, AG ;

SUTTON, RS ;

ANDERSON, CW .

IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846

[5]

Bellingham J, 2002, P AMER CONTR CONF, V1-6, P3741, DOI 10.1109/ACC.2002.1024509

[6]

Bello I., 2016, arXiv

[7]

Christianos F., 2020, ADV NEURAL INFORM PR, V33, P10707

[8]

Degris T, 2012, ARXIV12054839, P179, DOI DOI 10.5555/3042573.3042600

[9]

Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974

[10] An Introduction to Deep Reinforcement Learning [J].

Francois-Lavet, Vincent ;

Henderson, Peter ;

Islam, Riashat ;

Bellemare, Marc G. ;

Pineau, Joelle .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2018, 11 (3-4) :219-354

← 1 2 3 4 5 6 →