Probabilistic Reward-Based Reinforcement Learning for Multi-Agent Pursuit and Evasion

被引：1

作者：

Zhang, Bo-Kun ^{[1
]}

Hu, Bin ^{[1
]}

Chen, Long ^{[1
]}

Zhang, Ding-Xue ^{[2
]}

Cheng, Xin-Ming ^{[3
]}

Guan, Zhi-Hong ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China

[2] Yangtze Univ, Sch Petr Engn, Jingzhou 434023, Peoples R China

[3] Cent South Univ, Sch Automat, Changsha 430083, Peoples R China

来源：

PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021) | 2021年

关键词：

Reinforcement learning; Multi-agent; Pursuit-evasion; Probabilistic reward; SYSTEMS;

D O I：

10.1109/CCDC52312.2021.9601771

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The reinforcement learning is studied to solve the problem of multi-agent pursuit and evasion games in this article. The main problem of current reinforcement learning for multi-agents is the low learning efficiency of agents. An important factor leading to this problem is that the delay of the Q function is related to the environment changing. To solve this problem, a probabilistic distribution reward value is used to replace the Q function in the multi-agent depth deterministic policy gradient framework (hereinafter referred to as MADDPG). The distribution Bellman equation is proved to be convergent, and can be brought into the framework of reinforcement learning algorithm. The probabilistic distribution reward value is updated in the algorithm, so that the reward value can be more adaptive to the complex environment. In the same time, eliminating the delay of rewards improves the efficiency of the strategy and obtains a better pursuit-evasion results. The final simulation and experiment show that the multi-agent algorithm with distribution rewards achieves better results under the setting environment.

引用

页码：3352 / 3357

页数：6

共 50 条

[31] Hierarchical Multi-Agent Training Based on Reinforcement Learning
Wang, Guanghua
Li, Wenjie
Wu, Zhanghua
Guo, Xian
2024 9TH ASIA-PACIFIC CONFERENCE ON INTELLIGENT ROBOT SYSTEMS, ACIRS, 2024, : 11 - 18
[32] Cooperative multi-agent game based on reinforcement learning
Liu, Hongbo
HIGH-CONFIDENCE COMPUTING, 2024, 4 (01):
[33] State-based episodic memory for multi-agent reinforcement learning
Xiao Ma
Wu-Jun Li
Machine Learning, 2023, 112 : 5163 - 5190
[34] Constraint-based multi-agent reinforcement learning for collaborative tasks
Shang, Xiumin
Xu, Tengyu
Karamouzas, Ioannis
Kallmann, Marcelo
COMPUTER ANIMATION AND VIRTUAL WORLDS, 2023, 34 (3-4)
[35] Macro-Action-Based Deep Multi-Agent Reinforcement Learning
Xiao, Yuchen
Hoffman, Joshua
Amato, Christopher
CONFERENCE ON ROBOT LEARNING, VOL 100, 2019, 100
[36] State-based episodic memory for multi-agent reinforcement learning
Ma, Xiao
Li, Wu-Jun
MACHINE LEARNING, 2023, 112 (12) : 5163 - 5190
[37] An FPGA-based multi-agent Reinforcement Learning timing synchronizer
Cardarilli, Gian Carlo
Di Nunzio, Luca
Fazzolari, Rocco
Giardino, Daniele
Re, Marco
Ricci, Andrea
Spano, Sergio
COMPUTERS & ELECTRICAL ENGINEERING, 2022, 99
[38] Noise Distribution Decomposition Based Multi-Agent Distributional Reinforcement Learning
Geng, Wei
Xiao, Baidi
Li, Rongpeng
Wei, Ning
Wang, Dong
Zhao, Zhifeng
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2025, 24 (03) : 2301 - 2314
[39] LJIR: Learning Joint-Action Intrinsic Reward in cooperative multi-agent reinforcement learning
Chen, Zihan
Luo, Biao
Hu, Tianmeng
Xu, Xiaodong
NEURAL NETWORKS, 2023, 167 : 450 - 459
[40] Reward-based participant selection for improving federated reinforcement learning
Lee, Woonghee
ICT EXPRESS, 2023, 9 (05): : 803 - 808

← 1 2 3 4 5 →