Research Progress of Multi-Agent Deep Reinforcement Learning

被引:0
作者
Ding, Shi-Feiu [1 ,2 ]
Du, Weiu [1 ]
Zhang, Jianu [1 ,2 ]
Guo, Li-Liu [1 ,2 ]
Ding, Ding [3 ]
机构
[1] School of Computer Science and Technology, China University of Mining and Technology, Jiangsu, Xuzhou
[2] Mine Digitization Engineering Research Center of the Ministry of Education, China University of Mining and Technology, Jiangsu, Xuzhou
[3] College of Intelligence and Computing, Tianjin University, Tianjin
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2024年 / 47卷 / 07期
关键词
communication learning; graph neural network; multi-agent deep reinforcement learning; policy-based; value-based;
D O I
10.11897/SP.J.1016.2024.01547
中图分类号
学科分类号
摘要
Reinforcement learning is a traditional machine learning method to solve complex decision-making problems. With the advent of the era of artificial intelligence, deep learning has achieved remarkable success thanks to the vast amount of data and the increase in computing power brought by hardware development. Deep reinforcement learning (DRL) has been widely paid attention in recent years and achieved remarkable success in various fields. Because the real environment usually includes multiple agents interacting with the environment, the multi-agent deep reinforcement learning (MADRL) has gained vigorous development and achieved excellent performance in a variety of complex sequential decision tasks. This paper summarizes the research progress of multi-agent deep reinforcement learning, which is divided into three parts. First, we review several common multi-agent reinforcement learning problem representations such as Markov games and partially observable Markov games and their corresponding cooperative,competitive, and mixed cooperative-competitive tasks. Second, we make a new multi-dimensional classification of the current MADRL method and further introduce the methods of different categories. Concretely, we divide MADRL into value-based function methods and policy-based methods according to different ways of solving optimal policies. Besides, we divide MADRL into cooperative tasks and general tasks (cooperative, competitive, or mixed task) according to applicable task types. In addition, we introduce a new dimension, that is, whether a communication mechanism is established between agents, dividing the MADRL into communication and non-communication methods. Based on the above three dimensions, the popular MADRL methods are divided into eight categories. Among them, we focus on the value function decomposition method, communication-based MADRL method, and graph neural network based MADRL method. Value function decomposition methods can be divided into simple factorization, IGM principle based, and others. Communication structures are divided into fully connected, star, tree, neighbor, and layered types. In addition, we study the main applications of MADRL methods in real-world scenarios such as autonomous driving, traffic signal control, and recommendation systems. The classification in this paper is based on several common types of MADRL problem representation and model-free MADRL methods, so there are many unfocused but promising directions, which we briefly analyze in section 5, including extensive game problems, model-based MADRL methods, and safe and robust MADRL. Finally, we give a summary of this paper. With the rapid development of deep learning methods, the MARL field is undergoing rapid change, and many previously unsolvable problems are gradually becoming easier to handle with MADRL methods. MADRL is a developing field, that attracts more interest from scholars, but also faces many challenges such as non-stationarity, dimensional curse, and credit assignment. Overall, DRL can improve the intelligence and efficiency of systems in various fields by learning optimal decision strategies, bringing tremendous impact and change to human society. In this paper, we provide a broad overview of the latest work in the emerging field of multi-agent deep reinforcement learning, starting from extended game theory, model-based MADRL, and secure and robust MADRL. We expect this paper will be helpful to new researchers entering this rapidly developing field and to existing field experts who want to gain a comprehensive understanding and determine new directions based on the latest advances. © 2024 Science Press. All rights reserved.
引用
收藏
页码:1547 / 1567
页数:20
相关论文
共 120 条
[1]  
LeCun Y., Bengio Y., Hinton G., Deep learning, Nature, 521, 7553, pp. 436-444, (2015)
[2]  
Zhang S, Gong Y, Wang J., The development of deep convo-lutional neural networks and its application on computer vision, Chinese Journal of Computers, 42, 3, pp. 453-482, (2019)
[3]  
Jiang J, Li Z, Liu X., Deep learning based monocular depth estimation: a survey, Chinese Journal of Computers, 45, 6, pp. 1276-1307, (2022)
[4]  
Li Y, Gao Y, Yan J, Et al., Image inpainting methods based on deep neural networks: A review, Chinese Journal of Computers, 44, 11, pp. 2295-2316, (2021)
[5]  
Malik M, Malik M K, Mehmood K, Et al., Automatic speech recognition: a survey, Multimedia Tools and Applications, 80, pp. 9411-9457, (2021)
[6]  
Lee W, Seong J J, Ozlu B, Et al., Biosignal sensors and deep learning-based speech recognition: A review, Sensors, 21, 4, (2021)
[7]  
Minaee S, Abdolrashidi A, Su H, Et al., Biometrics recognition using deep learning: A survey, Artificial Intelligence Review, 56, 8, pp. 8647-8695, (2023)
[8]  
Otter D W, Medina J R, Kalita J K., A survey of the usages of deep learning for natural language processing, IEEE Transactions on Neural Networks and Learning Systems, 32, 2, pp. 604-624, (2020)
[9]  
Pandey B, Pandey D K, Mishra B P, Et al., A comprehensive survey of deep learning in the field of medical imaging and medical natural language processing: Challenges and research directions, Journal of King Saud University-Computer and Information Sciences, 34, 8, pp. 5083-5099, (2022)
[10]  
Chai J, Li A., Deep learning in natural language processing: A state-of-the-art survey, Proceedings of the list International Conference on Machine Learning and Cybernetics, pp. 1-6, (2019)