Multi-Task Multi-Agent Reinforcement Learning With Task-Entity Transformers and Value Decomposition Training

被引：1

作者：

Zhu, Yuanheng ^{[1
,2
]}

Huang, Shangjing ^{[1
,2
]}

Zuo, Binbin ^{[1
,2
]}

Zhao, Dongbin ^{[1
,2
]}

Sun, Changyin ^{[3
]}

机构：

[1] Chinese Acad Sci, Inst Automat, State Key Lab Multimodal Artificial Intelligence S, Beijing 100190, Peoples R China

[2] Univ Chinese Acad Sci, Sch Artificial Intelligence, Beijing 100049, Peoples R China

[3] Anhui Univ, Sch Artificial Intelligence, Hefei 230093, Peoples R China

来源：

IEEE TRANSACTIONS ON AUTOMATION SCIENCE AND ENGINEERING | 2025年 / 22卷

基金：

北京市自然科学基金; 中国国家自然科学基金;

关键词：

Multi-agent systems; reinforcement learning; multi-task learning; transformer; pretrained language model;

D O I：

10.1109/TASE.2024.3501580

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Multi-task multi-agent reinforcement learning aims to control multiple agents to perform well on multiple tasks. It encounters three core challenges: the varying number of agents and entities, the disparities in cooperative behaviors among different tasks, and the training imbalance caused by varying task difficulty levels. To address these issues, we propose a novel framework named Task-Entity Transformer Qmix (TETQmix), which employs pretrained language models for task encoding, utilizes proposed Task-Entity Transformer to handle observations across various tasks, and adjusts task learning weights to achieve balanced multi-task training. Task-Entity Transformer not only enables handling multi-task scenarios with varying numbers of agents and entities, but also leverages cross-attention modules to integrate observation and task embeddings, so that each agent can obtain individual values and decisions for multiple tasks. We then utilize a transformer-based mixer to monotonically combine the individual values, and train the whole network's parameters using temporal-difference errors. To facilitate multi-task training, we define task regret as the difference between the current-stage return and the candidate best one, and adjust the learning weight of each task based on its task regret. Experiments are conducted on both simulated multi-particle environments and real-world multi-robot systems. Compared with existing baselines, our method not only is superior in multi-task learning efficiency, but also shows promising transfer ability on unseen tasks. Note to Practitioners-The flexibility of multi-agent systems makes them quite fit to multiple tasks. Compared to designing different decision models for different tasks, it is more convenient if one can use just one decision model to resolve multiple tasks. Besides, it can make the maximum utilization of trajectory data coming from similar tasks when the data are integrated for multi-task decision model training. Natural language provides a powerful tool to describe the task context and emphasize the similarities or differences among different tasks. Pretrained language models can encode the task context, based on which the decision model can adjust its output distribution for different tasks and even synthesize the decisions from existing and similar tasks to achieve promising zero-shot and few-shot transfer performance for unseen tasks. With our proposed TETQmix, practitioners are able to realize multi-task capability in multi-agent systems and increase the generalization in a variety of scenarios.

引用

页码：9164 / 9177

页数：14

共 49 条

[1]

Bai FS, 2023, AAAI CONF ARTIF INTE, P6728

[2]

Brohan A., 2023, P ROB SCI SYST RSS, P1

[3]

Cer D, 2018, CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018): PROCEEDINGS OF SYSTEM DEMONSTRATIONS, P169

[4]

Chai, 2024, P 23 INT C AUT AG MU, P281

[5] NVIF: Neighboring Variational Information Flow for Cooperative Large-Scale Multiagent Reinforcement Learning [J].

Chai, Jiajun ;

Zhu, Yuanheng ;

Zhao, Dongbin .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (12) :17829-17841

[6] A Hierarchical Deep Reinforcement Learning Framework for 6-DOF UCAV Air-to-Air Combat [J].

Chai, Jiajun ;

Chen, Wenzhang ;

Zhu, Yuanheng ;

Yao, Zong-Xin ;

Zhao, Dongbin .

IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2023, 53 (09) :5417-5429

[7] UNMAS: Multiagent Reinforcement Learning for Unshaped Cooperative Scenarios [J].

Chai, Jiajun ;

Li, Weifan ;

Zhu, Yuanheng ;

Zhao, Dongbin ;

Ma, Zhe ;

Sun, Kewu ;

Ding, Jishiyu .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (04) :2093-2104

[8] Deep Multi-Agent Reinforcement Learning for Highway On-Ramp Merging in Mixed Traffic [J].

Chen, Dong ;

Hajidavalloo, Mohammad R. ;

Li, Zhaojian ;

Chen, Kaian ;

Wang, Yongqiang ;

Jiang, Longsheng ;

Wang, Yue .

IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2023, 24 (11) :11623-11638

[9] Knowledge distillation for portfolio management using multi-agent reinforcement learning [J].

Chen, Min-You ;

Chen, Chiao-Ting ;

Huang, Szu-Hao .

ADVANCED ENGINEERING INFORMATICS, 2023, 57

[10]

de Witt CAS, 2019, ADV NEUR IN, V32

← 1 2 3 4 5 →