Preference-based experience sharing scheme for multi-agent reinforcement learning in multi-target environments

被引：1

作者：

Zuo, Xuan ^{[1
]}

Zhang, Pu ^{[1
,2
]}

Li, Hui-Yan ^{[3
]}

Liu, Zhun-Ga ^{[1
]}

机构：

[1] Northwestern Polytech Univ, Sch Automat, West Youyi Rd, Xian 710072, Shaanxi, Peoples R China

[2] Xian Univ Technol, Sch Automat & Informat Engn, Jinhua Rd, Xian 710048, Shaanxi, Peoples R China

[3] China Aerosp Acad Syst Sci & Engn, Fucheng Rd, Beijing 100048, Peoples R China

来源：

EVOLVING SYSTEMS | 2024年 / 15卷 / 05期

基金：

中国国家自然科学基金;

关键词：

Experience sharing; Multi-agent; Target assignment; Reinforcement learning; Experience replay; UAV; ASSIGNMENT;

D O I：

10.1007/s12530-024-09587-4

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-agent reinforcement learning is a varied and highly active field of research. The idea of parameter sharing or experience sharing has recently been introduced into multi-agent reinforcement learning to accelerate the training of multiple neural networks and enhance the final returns. However, implementing the parameter or experience sharing methods in multi-agent environments could introduce additional constraint or computation cost. This work presents a preference-based experience sharing scheme, which allows for different policies in environments with weakly homogeneous agents and requires barely any additional computational power. In this scheme, the experience replay buffer is augmented by adding a choice vector which indicates the preferred target of the agent, and each agent can learn various policies from the experience data collected by other agents who choose the same target. PSE-MADDPG, an off-policy algorithm with the preference-based experience sharing scheme, is proposed and benchmarked on a multi-target assignment and cooperative navigation mission. Experimental results show that PSE-MADDPG can successfully solve the problem of multiple targets assignment and outperform two classical deep reinforcement learning algorithms by learning in fewer steps and converging to higher episode rewards. Meanwhile, PSE-MADDPG relaxes the constraint of the strongly homogeneous agents assumption and requires little additional computation cost.

引用

页码：1681 / 1699

页数：19

共 60 条

[41]

Richards A., AIAA GUIDANCE NAVIGA

[42]

Schulman J, 2015, PR MACH LEARN RES, V37, P1889

[43]

Shaolei Wang, 2012, 2012 4th International Conference on Intelligent Human-Machine Systems and Cybernetics (IHMSC), P141, DOI 10.1109/IHMSC.2012.41

[44]

Shin M.K., 2019, AS PAC INT S AER TEC, P2382

[45] Dynamic distributed constraint optimization using multi-agent reinforcement learning [J].

Shokoohi, Maryam ;

Afsharchi, Mohsen ;

Shah-Hoseini, Hamed .

SOFT COMPUTING, 2022, 26 (08) :3601-3629

[46]

Silver D, 2014, PR MACH LEARN RES, V32

[47]

Singh L, 2001, P AMER CONTR CONF, P2301, DOI 10.1109/ACC.2001.946095

[48] Evolutionary Multi-Objective Reinforcement Learning Based Trajectory Control and Task Offloading in UAV-Assisted Mobile Edge Computing [J].

Song, Fuhong ;

Xing, Huanlai ;

Wang, Xinhan ;

Luo, Shouxi ;

Dai, Penglin ;

Xiao, Zhiwen ;

Zhao, Bowen .

IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (12) :7387-7405

[49]

Sunehag P, 2018, PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS (AAMAS' 18), P2085

[50]

Sutton RS, 2018, ADAPT COMPUT MACH LE, P1

← 1 2 3 4 5 6 →