Closely Cooperative Multi-Agent Reinforcement Learning Based on Intention Sharing and Credit Assignment

被引：0

作者：

Fu, Hao ^{[1
,2
]}

You, Mingyu ^{[1
,2
]}

Zhou, Hongjun ^{[1
,2
]}

He, Bin ^{[1
,2
]}

机构：

[1] Tongji Univ, Shanghai Res Inst Intelligent Autonomous Syst, Coll Elect & Informat Engn, Shanghai 200070, Peoples R China

[2] Frontiers Sci Ctr Intelligent Autonomous Syst, State Key Lab Intelligent Autonomous Syst, Shanghai Key Lab Intelligent Autonomous Syst, Shanghai 201203, Peoples R China

来源：

IEEE ROBOTICS AND AUTOMATION LETTERS | 2024年 / 9卷 / 12期

基金：

中国国家自然科学基金;

关键词：

Reinforcement learning; Collaboration; Encoding; Training; Multi-agent systems; Autonomous systems; Mutual information; Decision making; Trajectory; Synchronization; MARL; closely collaborative tasks; intention sharing; credit assignment;

D O I：

10.1109/LRA.2024.3497661

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Collaborative tasks are important in multi-agent systems. Multi-agent reinforcement learning is a commonly used technique for solving multi-agent cooperative policy learning. The closely collaborative task is a special but common case within cooperative tasks, where the change in the environmental state requires multiple agents to simultaneously perform specific actions. For example, in a box-pushing task where the boxes are heavy and require multiple agents to push simultaneously. The closely cooperative task faces some unique challenges. Firstly, the completion of a closely collaborative task requires agents to synchronize their actions, necessitating a consistent intention among them. Secondly, when some agents' erroneous actions lead to task failure, it becomes a challenge to avoid incorrectly penalizing agents who performed the correct actions. These challenges make most of the existing MARL methods perform poorly on this task. In this letter, we propose a closely collaborative multi-agent reinforcement learning(CC-MARL) algorithm based on intention sharing and credit assignment. We use a two-phase training to learn intention encoding and intention sharing respectively, and decompose joint action values based on counterfactual baseline ideas. We deployed scenarios in both simulated and real environments with various sizes, numbers of boxes, and numbers of agents and compare CC-MARL with various classical MARL algorithms on box-pushing tasks of different map scales in simulation, demonstrating the state-of-the-art of our method.

引用

页码：11770 / 11777

页数：8

共 50 条

[11] Multi-agent cooperative learning research based on reinforcement learning
Liu, Fei
Zeng, Guangzhou
2006 10TH INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN, PROCEEDINGS, VOLS 1 AND 2, 2006, : 1408 - 1413
[12] Effective credit assignment deep policy gradient multi-agent reinforcement learning for vehicle dispatch
Xiaohui Huang
Xiong Zhang
Jiahao Ling
Xuebo Cheng
Applied Intelligence, 2023, 53 : 23457 - 23469
[13] Effective credit assignment deep policy gradient multi-agent reinforcement learning for vehicle dispatch
Huang, Xiaohui
Zhang, Xiong
Ling, Jiahao
Cheng, Xuebo
APPLIED INTELLIGENCE, 2023, 53 (20) : 23457 - 23469
[14] Cooperative Action Acquisition Based on Intention Estimation Method in a Multi-agent Reinforcement Learning System
Tsubakimoto, Tatsuya
Kobayashi, Kunikazu
PROCEEDINGS OF INTERNATIONAL CONFERENCE ON ARTIFICIAL LIFE AND ROBOTICS (ICAROB 2014), 2014, : 122 - 125
[15] A Cooperative Multi-Agent Reinforcement Learning Method Based on Coordination Degree
Cui, Haoyan
Zhang, Zhen
IEEE ACCESS, 2021, 9 : 123805 - 123814
[16] Cooperative Learning of Multi-Agent Systems Via Reinforcement Learning
Wang, Xin
Zhao, Chen
Huang, Tingwen
Chakrabarti, Prasun
Kurths, Juergen
IEEE TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING OVER NETWORKS, 2023, 9 : 13 - 23
[17] A review of cooperative multi-agent deep reinforcement learning
Afshin Oroojlooy
Davood Hajinezhad
Applied Intelligence, 2023, 53 : 13677 - 13722
[18] A review of cooperative multi-agent deep reinforcement learning
Oroojlooy, Afshin
Hajinezhad, Davood
APPLIED INTELLIGENCE, 2023, 53 (11) : 13677 - 13722
[19] Action Prediction for Cooperative Exploration in Multi-agent Reinforcement Learning
Zhang, Yanqiang
Feng, Dawei
Ding, Bo
NEURAL INFORMATION PROCESSING, ICONIP 2023, PT II, 2024, 14448 : 358 - 372
[20] Multi-Agent Reinforcement Learning for Cooperative Coded Caching via Homotopy Optimization
Wu, Xiongwei
Li, Jun
Xiao, Ming
Ching, P. C.
Poor, H. Vincent
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2021, 20 (08) : 5258 - 5272

← 1 2 3 4 5 →