Multi-agent deep reinforcement learning via double attention and adaptive entropy

被引：0

作者：

Wu, Pei-Liang ^{[1
,2
]}

Yuan, Xu-Dong ^{[1
,2
]}

Mao, Bing-Yi ^{[1
,2
]}

Chen, Wen-Bai ^{[3
]}

Gao, Guo-Wei ^{[3
]}

机构：

[1] School of Information Science and Engineering, Yanshan University, Hebei, Qinhuangdao

[2] The Key Laboratory for Computer Virtual Technology and System Integration of Hebei Province, Hebei, Qinhuangdao

[3] School of Automation, Beijing Information Science and Technology University, Beijing

来源：

Kongzhi Lilun Yu Yingyong/Control Theory and Applications | 2024年 / 41卷 / 10期

基金：

中国国家自然科学基金;

关键词：

actor-critic; adaptive entropy; attention; multi-agent systems; reinforcement learning;

D O I：

10.7641/CTA.2023.21023

中图分类号：

学科分类号：

摘要：

In actor-critic algorithm and maximum entropy reinforcement learning, there are problems of overestimation of value function and fragility of temperature parameter, which lead to the policy network falling into local optimization. To solve this problem, an algorithm based on double attention mechanism and adaptive temperature parameters is proposed in this paper. First, two networks of attention critics with different initial parameters are constructed to make more accurate evaluation of the policy network, so as to avoid overestimation problems that cause the policy network to fall into local optimization. Secondly, a maximum entropy reinforcement learning algorithm for adaptive temperature parameters is proposed, which calculates the policy entropy and baseline entropy of each agent, and dynamically adjusts the temperature parameters to achieve the exploration of adaptive adjustment of agents. Finally, the effectiveness of our algorithm is verified in the constrained cooperative navigation environment and the constrained treasure collection environment. The average total cost and average total penalty of our algorithm are superior to other algorithms. © 2024 South China University of Technology. All rights reserved.

引用

页码：1930 / 1936

页数：6

共 9 条

[1]

XU T, LIANG Y, LAN G., Crpo: A new approach for safe reinforcement learning with convergence guarantee, International Conference on Machine Learning, pp. 11480-11491, (2021)

[2]

REDDY D S K, SAHA A, TAMILSELVAM S G, Et al., Risk averse reinforcement learning for mixed multi-agent environments, Proceedings of the 18th International Conference on Autonomous Agents and Multiagent Systems, pp. 2171-2173, (2019)

[3]

MANNOR S, SIMESTER D, SUN P, Et al., Bias and variance approximation in value function estimates, Management Science, 53, 2, pp. 308-322, (2007)

[4]

FUJIMOTO S, HOOF H, MEGER D., Addressing function approximation error in actor-critic methods, International Conference on Machine Learning, pp. 1587-1596, (2018)

[5]

VAN E F, NOBRE A C., Turning attention inside out: How working memory serves behavior, Annual Review of Psychology, 74, pp. 137-165, (2023)

[6]

IQBAL S, SHA F., Actor-attention-critic for multi-agent reinforcement learning, International Conference on Machine Learning, pp. 2961-2970, (2019)

[7]

PARNIKA P, DIDDIGI R B, DANDA S K R, Et al., Attention actor-critic algorithm for multi-agent constrained co-operative reinforcement learning, (2021)

[8]

HAARNOJA T, ZHOU A, ABBEEL P, Et al., Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, International Conference on Machine Learning, pp. 1861-1870, (2018)

[9]

ZHOU M, LIU Z, SUI P, Et al., Learning implicit credit assignment for cooperative multi-agent reinforcement learning, Advances in Neural Information Processing Systems, 33, pp. 11853-11864, (2020)

← 1 →