Probabilistic Reward-Based Reinforcement Learning for Multi-Agent Pursuit and Evasion

被引：1

作者：

Zhang, Bo-Kun ^{[1
]}

Hu, Bin ^{[1
]}

Chen, Long ^{[1
]}

Zhang, Ding-Xue ^{[2
]}

Cheng, Xin-Ming ^{[3
]}

Guan, Zhi-Hong ^{[1
]}

机构：

[1] Huazhong Univ Sci & Technol, Sch Artificial Intelligence & Automat, Wuhan 430074, Peoples R China

[2] Yangtze Univ, Sch Petr Engn, Jingzhou 434023, Peoples R China

[3] Cent South Univ, Sch Automat, Changsha 430083, Peoples R China

来源：

PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021) | 2021年

关键词：

Reinforcement learning; Multi-agent; Pursuit-evasion; Probabilistic reward; SYSTEMS;

D O I：

10.1109/CCDC52312.2021.9601771

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The reinforcement learning is studied to solve the problem of multi-agent pursuit and evasion games in this article. The main problem of current reinforcement learning for multi-agents is the low learning efficiency of agents. An important factor leading to this problem is that the delay of the Q function is related to the environment changing. To solve this problem, a probabilistic distribution reward value is used to replace the Q function in the multi-agent depth deterministic policy gradient framework (hereinafter referred to as MADDPG). The distribution Bellman equation is proved to be convergent, and can be brought into the framework of reinforcement learning algorithm. The probabilistic distribution reward value is updated in the algorithm, so that the reward value can be more adaptive to the complex environment. In the same time, eliminating the delay of rewards improves the efficiency of the strategy and obtains a better pursuit-evasion results. The final simulation and experiment show that the multi-agent algorithm with distribution rewards achieves better results under the setting environment.

引用

页码：3352 / 3357

页数：6

共 50 条

[21] Radar Waveform Design Based on Multi-Agent Reinforcement Learning
Yang, Qingpei
Han, Zhuangzhi
Wang, Han
Dong, Jian
Zhao, Yang
INTERNATIONAL JOURNAL OF PATTERN RECOGNITION AND ARTIFICIAL INTELLIGENCE, 2021, 35 (10)
[22] A Distributed Multi-Agent Dynamic Area Coverage Algorithm Based on Reinforcement Learning
Xiao, Jian
Wang, Gang
Zhang, Ying
Cheng, Lei
IEEE ACCESS, 2020, 8 : 33511 - 33521
[23] Multi-Agent Reinforcement Learning-Based Distributed Dynamic Spectrum Access
Albinsaid, Hasan
Singh, Keshav
Biswas, Sudip
Li, Chih-Peng
IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2022, 8 (02) : 1174 - 1185
[24] Mobile User Interface Adaptation Based on Usability Reward Model and Multi-Agent Reinforcement Learning
Vidmanov, Dmitry
Alfimtsev, Alexander
MULTIMODAL TECHNOLOGIES AND INTERACTION, 2024, 8 (04)
[25] Multi-agent deep reinforcement learning: a survey
Gronauer, Sven
Diepold, Klaus
ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (02) : 895 - 943
[26] Reinforcement Learning for Multi-Agent Competitive Scenarios
Coutinho, Manuel
Reis, Luis Paulo
2022 IEEE INTERNATIONAL CONFERENCE ON AUTONOMOUS ROBOT SYSTEMS AND COMPETITIONS (ICARSC), 2022, : 130 - 135
[27] Leaders and Collaborators: Addressing Sparse Reward Challenges in Multi-Agent Reinforcement Learning
Sun, Shaoqi
Liu, Hui
Xu, Kele
Ding, Bo
IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE, 2024,
[28] Specification Aware Multi-Agent Reinforcement Learning
Ritz, Fabian
Phan, Thomy
Mueller, Robert
Gabor, Thomas
Sedlmeier, Andreas
Zeller, Marc
Wieghardt, Jan
Schmid, Reiner
Sauer, Horst
Klein, Cornel
Linnhoff-Popien, Claudia
AGENTS AND ARTIFICIAL INTELLIGENCE, ICAART 2021, 2022, 13251 : 3 - 21
[29] Decentralized Multi-Agent Reinforcement Learning in Average-Reward Dynamic DCOPs
Duc Thien Nguyen
Yeoh, William
Hoong Chuin Lau
Zilberstein, Shlomo
Zhang, Chongjie
AAMAS'14: PROCEEDINGS OF THE 2014 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS, 2014, : 1341 - 1342
[30] Multi-agent cooperative learning research based on reinforcement learning
Liu, Fei
Zeng, Guangzhou
2006 10TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, PROCEEDINGS, VOLS 1 AND 2, 2006, : 1408 - 1413

← 1 2 3 4 5 →