A Multi-Agent Centralized Strategy Gradient Reinforcement Learning Algorithm Based on State Transition

被引:0
|
作者
Sheng, Lei [1 ,2 ]
Chen, Honghui [1 ]
Chen, Xiliang [2 ]
机构
[1] Natl Univ Def Technol, Natl Key Lab Informat Syst Engn, Changsha 410073, Peoples R China
[2] Army Engn Univ, Sch Command & Control Engn, Nanjing 210007, Peoples R China
关键词
automatic representation; state transition; exploration; exploitation; multi-agent reinforcement learning; deterministic strategy gradient algorithm; actor-critic methods;
D O I
10.3390/a17120579
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The prevalent utilization of deterministic strategy algorithms in Multi-Agent Deep Reinforcement Learning (MADRL) for collaborative tasks has posed a significant challenge in achieving stable and high-performance cooperative behavior. Addressing the need for the balanced exploration and exploitation of multi-agent ant robots within a partially observable continuous action space, this study introduces a multi-agent centralized strategy gradient algorithm grounded in a local state transition mechanism. In order to solve this challenge, the algorithm learns local state and local state-action representation from local observations and action values, thereby establishing a "local state transition" mechanism autonomously. As the input of the actor network, the automatically extracted local observation representation reduces the input state dimension, enhances the local state features closely related to the local state transition, and promotes the agent to use the local state features that affect the next observation state. To mitigate non-stationarity and reliability assignment issues in multi-agent environments, a centralized critic network evaluates the current joint strategy. The proposed algorithm, NST-FACMAC, is evaluated alongside other multi-agent deterministic strategy algorithms in a continuous control simulation environment using a multi-agent ant robot. The experimental results indicate accelerated convergence and higher average reward values in cooperative multi-agent ant simulation environments. Notably, in four simulated environments named Ant-v2 (2 x 4), Ant-v2 (2 x 4d), Ant-v2 (4 x 2), and Manyant (2 x 3), the algorithm demonstrates performance improvements of approximately 1.9%, 4.8%, 11.9%, and 36.1%, respectively, compared to the best baseline algorithm. These findings underscore the algorithm's effectiveness in enhancing the stability of multi-agent ant robot control within dynamic environments.
引用
收藏
页数:16
相关论文
共 50 条
  • [1] On Centralized Critics in Multi-Agent Reinforcement Learning
    Lyu, Xueguang
    Baisero, Andrea
    Xiao, Yuchen
    Daley, Brett
    Amato, Christopher
    JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2023, 77 : 295 - 354
  • [2] On Centralized Critics in Multi-Agent Reinforcement Learning
    Lyu, Xueguang
    Baisero, Andrea
    Xiao, Yuchen
    Daley, Brett
    Amato, Christopher
    Journal of Artificial Intelligence Research, 2023, 77 : 295 - 354
  • [3] Survey of Multi-Agent Strategy Based on Reinforcement Learning
    Chen, Liang
    Guo, Ting
    Liu, Yun-ting
    Yang, Jia-ming
    PROCEEDINGS OF THE 32ND 2020 CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2020), 2020, : 604 - 609
  • [4] Centralized reinforcement learning for multi-agent cooperative environments
    Chengxuan Lu
    Qihao Bao
    Shaojie Xia
    Chongxiao Qu
    Evolutionary Intelligence, 2024, 17 : 267 - 273
  • [5] Centralized reinforcement learning for multi-agent cooperative environments
    Lu, Chengxuan
    Bao, Qihao
    Xia, Shaojie
    Qu, Chongxiao
    EVOLUTIONARY INTELLIGENCE, 2024, 17 (01) : 267 - 273
  • [6] FMR-GA - A Cooperative Multi-agent Reinforcement Learning Algorithm Based on Gradient Ascent
    Zhang, Zhen
    Wang, Dongqing
    Zhao, Dongbin
    Song, Tingting
    NEURAL INFORMATION PROCESSING, ICONIP 2017, PT I, 2017, 10634 : 840 - 848
  • [7] A Multi-agent Reinforcement Learning Algorithm Based on Stackelberg Game
    Cheng, Chi
    Zhu, Zhangqing
    Xin, Bo
    Chen, Chunlin
    2017 6TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS (DDCLS), 2017, : 727 - 732
  • [8] Traffic Distribution Algorithm Based on Multi-Agent Reinforcement Learning
    Cheng C.
    Teng J.-J.
    Zhao Y.-L.
    Song M.
    Beijing Youdian Daxue Xuebao/Journal of Beijing University of Posts and Telecommunications, 2019, 42 (06): : 43 - 48and57
  • [9] Multi-Agent Reinforcement Learning Algorithm Based on Action Prediction
    童亮
    陆际联
    Journal of Beijing Institute of Technology(English Edition), 2006, (02) : 133 - 137
  • [10] Multi-agent reinforcement learning algorithm based on neural networks
    Tang, Lianggui
    Yang, Hu
    An, Bo
    Cheng, Daijie
    DYNAMICS OF CONTINUOUS DISCRETE AND IMPULSIVE SYSTEMS-SERIES B-APPLICATIONS & ALGORITHMS, 2006, 13E : 1569 - 1574