Research Progress of Multi-Agent Deep Reinforcement Learning

被引：0

作者：

Ding, Shi-Feiu ^{[1
,2
]}

Du, Weiu ^{[1
]}

Zhang, Jianu ^{[1
,2
]}

Guo, Li-Liu ^{[1
,2
]}

Ding, Ding ^{[3
]}

机构：

[1] School of Computer Science and Technology, China University of Mining and Technology, Jiangsu, Xuzhou

[2] Mine Digitization Engineering Research Center of the Ministry of Education, China University of Mining and Technology, Jiangsu, Xuzhou

[3] College of Intelligence and Computing, Tianjin University, Tianjin

来源：

Jisuanji Xuebao/Chinese Journal of Computers | 2024年 / 47卷 / 07期

关键词：

communication learning; graph neural network; multi-agent deep reinforcement learning; policy-based; value-based;

D O I：

10.11897/SP.J.1016.2024.01547

中图分类号：

学科分类号：

摘要：

Reinforcement learning is a traditional machine learning method to solve complex decision-making problems. With the advent of the era of artificial intelligence, deep learning has achieved remarkable success thanks to the vast amount of data and the increase in computing power brought by hardware development. Deep reinforcement learning (DRL) has been widely paid attention in recent years and achieved remarkable success in various fields. Because the real environment usually includes multiple agents interacting with the environment, the multi-agent deep reinforcement learning (MADRL) has gained vigorous development and achieved excellent performance in a variety of complex sequential decision tasks. This paper summarizes the research progress of multi-agent deep reinforcement learning, which is divided into three parts. First, we review several common multi-agent reinforcement learning problem representations such as Markov games and partially observable Markov games and their corresponding cooperative,competitive, and mixed cooperative-competitive tasks. Second, we make a new multi-dimensional classification of the current MADRL method and further introduce the methods of different categories. Concretely, we divide MADRL into value-based function methods and policy-based methods according to different ways of solving optimal policies. Besides, we divide MADRL into cooperative tasks and general tasks (cooperative, competitive, or mixed task) according to applicable task types. In addition, we introduce a new dimension, that is, whether a communication mechanism is established between agents, dividing the MADRL into communication and non-communication methods. Based on the above three dimensions, the popular MADRL methods are divided into eight categories. Among them, we focus on the value function decomposition method, communication-based MADRL method, and graph neural network based MADRL method. Value function decomposition methods can be divided into simple factorization, IGM principle based, and others. Communication structures are divided into fully connected, star, tree, neighbor, and layered types. In addition, we study the main applications of MADRL methods in real-world scenarios such as autonomous driving, traffic signal control, and recommendation systems. The classification in this paper is based on several common types of MADRL problem representation and model-free MADRL methods, so there are many unfocused but promising directions, which we briefly analyze in section 5, including extensive game problems, model-based MADRL methods, and safe and robust MADRL. Finally, we give a summary of this paper. With the rapid development of deep learning methods, the MARL field is undergoing rapid change, and many previously unsolvable problems are gradually becoming easier to handle with MADRL methods. MADRL is a developing field, that attracts more interest from scholars, but also faces many challenges such as non-stationarity, dimensional curse, and credit assignment. Overall, DRL can improve the intelligence and efficiency of systems in various fields by learning optimal decision strategies, bringing tremendous impact and change to human society. In this paper, we provide a broad overview of the latest work in the emerging field of multi-agent deep reinforcement learning, starting from extended game theory, model-based MADRL, and secure and robust MADRL. We expect this paper will be helpful to new researchers entering this rapidly developing field and to existing field experts who want to gain a comprehensive understanding and determine new directions based on the latest advances. © 2024 Science Press. All rights reserved.

引用

页码：1547 / 1567

页数：20

共 120 条

[81]

Wang R, He X, Yu R, Et al., Learning efficient multi-agent communication: An information bottleneck approach, Proceedings of the 37th International Conference on Machine Learning, pp. 9908-9918, (2020)

[82]

Liu Y, Wang W, Hu Y, Et al., Multi-agent game abstraction via graph attention neural network, Proceedings of the 34th AAAI Conference on Artificial Intelligence, pp. 7211-7218, (2020)

[83]

Ding Z, Huang T, Lu Z., Learning individually inferred communication for multi-agent cooperation, Proceedings of the 33 rd Conference on Neural Information Processing Systems, pp. 22069-22079, (2020)

[84]

Kim W, Park J, Sung Y., Communication in multi-agent reinforcement learning: Intention sharing, Proceedings of the 9 th International Conference on Learning Representations, pp. 1-15, (2021)

[85]

Du Y, Liu B, Moens V., Et al., Learning correlated communication topology in multi-agent reinforcement learning, Proceedings of the 20th International Conferences on Autonomous Agents and Multiagent Systems, pp. 456-464, (2021)

[86]

Seraj E, Wang Z, Paleja R, Et al., Learning efficient diverse communication for cooperative heterogeneous teaming, Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pp. 1173-1182, (2022)

[87]

Das A, Gervet T, Romoff J, Et al., Tarmac: Targeted multi-agent communication, Proceedings of the 3 6 th International Conference on Machine Learning, pp. 1538-1546, (2019)

[88]

Singh A, Jain T, Sukhhaatar S., Learning when to communicate at scale in multiagent cooperative and competitive tasks, (2018)

[89]

Niu Y, Paleja R R, Gombolay M C., Multi-agent graph-attention communication and teaming, Proceedings of the 20th International Conferences on Autonomous Agents and Multiagent Systems, pp. 964-973, (2021)

[90]

Jaderberg M, Czarnecki W M, Dunning I, Et al., Human-level performance in 3D multiplayer games with population-based reinforcement learning, Science, 364, 6443, pp. 859-865, (2019)

← 3 4 5 6 7 8 9 10 11 12 →