Probabilistic Reward-Based Reinforcement Learning for Multi-Agent Pursuit and Evasion

被引:1
|
作者
Zhang, Bo-Kun [1 ]
Hu, Bin [1 ]
Chen, Long [1 ]
Zhang, Ding-Xue [2 ]
Cheng, Xin-Ming [3 ]
Guan, Zhi-Hong [1 ]
机构
[1] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China
[2] Yangtze Univ, Sch Petr Engn, Jingzhou 434023, Peoples R China
[3] Cent South Univ, Sch Automat, Changsha 430083, Peoples R China
来源
PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021) | 2021年
关键词
Reinforcement learning; Multi-agent; Pursuit-evasion; Probabilistic reward; SYSTEMS;
D O I
10.1109/CCDC52312.2021.9601771
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The reinforcement learning is studied to solve the problem of multi-agent pursuit and evasion games in this article. The main problem of current reinforcement learning for multi-agents is the low learning efficiency of agents. An important factor leading to this problem is that the delay of the Q function is related to the environment changing. To solve this problem, a probabilistic distribution reward value is used to replace the Q function in the multi-agent depth deterministic policy gradient framework (hereinafter referred to as MADDPG). The distribution Bellman equation is proved to be convergent, and can be brought into the framework of reinforcement learning algorithm. The probabilistic distribution reward value is updated in the algorithm, so that the reward value can be more adaptive to the complex environment. In the same time, eliminating the delay of rewards improves the efficiency of the strategy and obtains a better pursuit-evasion results. The final simulation and experiment show that the multi-agent algorithm with distribution rewards achieves better results under the setting environment.
引用
收藏
页码:3352 / 3357
页数:6
相关论文
共 50 条
  • [31] Hierarchical Multi-Agent Training Based on Reinforcement Learning
    Wang, Guanghua
    Li, Wenjie
    Wu, Zhanghua
    Guo, Xian
    2024 9TH ASIA-PACIFIC CONFERENCE ON INTELLIGENT ROBOT SYSTEMS, ACIRS, 2024, : 11 - 18
  • [32] Cooperative multi-agent game based on reinforcement learning
    Liu, Hongbo
    HIGH-CONFIDENCE COMPUTING, 2024, 4 (01):
  • [33] State-based episodic memory for multi-agent reinforcement learning
    Xiao Ma
    Wu-Jun Li
    Machine Learning, 2023, 112 : 5163 - 5190
  • [34] Constraint-based multi-agent reinforcement learning for collaborative tasks
    Shang, Xiumin
    Xu, Tengyu
    Karamouzas, Ioannis
    Kallmann, Marcelo
    COMPUTER ANIMATION AND VIRTUAL WORLDS, 2023, 34 (3-4)
  • [35] Macro-Action-Based Deep Multi-Agent Reinforcement Learning
    Xiao, Yuchen
    Hoffman, Joshua
    Amato, Christopher
    CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
  • [36] State-based episodic memory for multi-agent reinforcement learning
    Ma, Xiao
    Li, Wu-Jun
    MACHINE LEARNING, 2023, 112 (12) : 5163 - 5190
  • [37] An FPGA-based multi-agent Reinforcement Learning timing synchronizer
    Cardarilli, Gian Carlo
    Di Nunzio, Luca
    Fazzolari, Rocco
    Giardino, Daniele
    Re, Marco
    Ricci, Andrea
    Spano, Sergio
    COMPUTERS & ELECTRICAL ENGINEERING, 2022, 99
  • [38] Noise Distribution Decomposition Based Multi-Agent Distributional Reinforcement Learning
    Geng, Wei
    Xiao, Baidi
    Li, Rongpeng
    Wei, Ning
    Wang, Dong
    Zhao, Zhifeng
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (03) : 2301 - 2314
  • [39] LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning
    Chen, Zihan
    Luo, Biao
    Hu, Tianmeng
    Xu, Xiaodong
    NEURAL NETWORKS, 2023, 167 : 450 - 459
  • [40] Reward-based participant selection for improving federated reinforcement learning
    Lee, Woonghee
    ICT EXPRESS, 2023, 9 (05): : 803 - 808