Research Progress of Multi-Agent Deep Reinforcement Learning

被引:0
作者
Ding, Shi-Feiu [1 ,2 ]
Du, Weiu [1 ]
Zhang, Jianu [1 ,2 ]
Guo, Li-Liu [1 ,2 ]
Ding, Ding [3 ]
机构
[1] School of Computer Science and Technology, China University of Mining and Technology, Jiangsu, Xuzhou
[2] Mine Digitization Engineering Research Center of the Ministry of Education, China University of Mining and Technology, Jiangsu, Xuzhou
[3] College of Intelligence and Computing, Tianjin University, Tianjin
来源
Jisuanji Xuebao/Chinese Journal of Computers | 2024年 / 47卷 / 07期
关键词
communication learning; graph neural network; multi-agent deep reinforcement learning; policy-based; value-based;
D O I
10.11897/SP.J.1016.2024.01547
中图分类号
学科分类号
摘要
Reinforcement learning is a traditional machine learning method to solve complex decision-making problems. With the advent of the era of artificial intelligence, deep learning has achieved remarkable success thanks to the vast amount of data and the increase in computing power brought by hardware development. Deep reinforcement learning (DRL) has been widely paid attention in recent years and achieved remarkable success in various fields. Because the real environment usually includes multiple agents interacting with the environment, the multi-agent deep reinforcement learning (MADRL) has gained vigorous development and achieved excellent performance in a variety of complex sequential decision tasks. This paper summarizes the research progress of multi-agent deep reinforcement learning, which is divided into three parts. First, we review several common multi-agent reinforcement learning problem representations such as Markov games and partially observable Markov games and their corresponding cooperative,competitive, and mixed cooperative-competitive tasks. Second, we make a new multi-dimensional classification of the current MADRL method and further introduce the methods of different categories. Concretely, we divide MADRL into value-based function methods and policy-based methods according to different ways of solving optimal policies. Besides, we divide MADRL into cooperative tasks and general tasks (cooperative, competitive, or mixed task) according to applicable task types. In addition, we introduce a new dimension, that is, whether a communication mechanism is established between agents, dividing the MADRL into communication and non-communication methods. Based on the above three dimensions, the popular MADRL methods are divided into eight categories. Among them, we focus on the value function decomposition method, communication-based MADRL method, and graph neural network based MADRL method. Value function decomposition methods can be divided into simple factorization, IGM principle based, and others. Communication structures are divided into fully connected, star, tree, neighbor, and layered types. In addition, we study the main applications of MADRL methods in real-world scenarios such as autonomous driving, traffic signal control, and recommendation systems. The classification in this paper is based on several common types of MADRL problem representation and model-free MADRL methods, so there are many unfocused but promising directions, which we briefly analyze in section 5, including extensive game problems, model-based MADRL methods, and safe and robust MADRL. Finally, we give a summary of this paper. With the rapid development of deep learning methods, the MARL field is undergoing rapid change, and many previously unsolvable problems are gradually becoming easier to handle with MADRL methods. MADRL is a developing field, that attracts more interest from scholars, but also faces many challenges such as non-stationarity, dimensional curse, and credit assignment. Overall, DRL can improve the intelligence and efficiency of systems in various fields by learning optimal decision strategies, bringing tremendous impact and change to human society. In this paper, we provide a broad overview of the latest work in the emerging field of multi-agent deep reinforcement learning, starting from extended game theory, model-based MADRL, and secure and robust MADRL. We expect this paper will be helpful to new researchers entering this rapidly developing field and to existing field experts who want to gain a comprehensive understanding and determine new directions based on the latest advances. © 2024 Science Press. All rights reserved.
引用
收藏
页码:1547 / 1567
页数:20
相关论文
共 120 条
[41]  
Zhou H, Lan T, Aggarwal V., PAC: Assisted value factorization with counterfactual predictions in multi-agent reinforcement learning, Proceedings of the 35th Conference on Neural Information Processing Systems, (2022)
[42]  
Shen S, Qiu M, Liu J, Et al., ResQ: A residual Q function-based approach for multi-agent reinforcement learning value factorization, Proceedings of the 35th Conference on Neural Information Processing Systems, pp. 5471-5483, (2022)
[43]  
Hong Y, Jin Y, Tang Y., Rethinking individual global max in cooperative multi-agent reinforcement learning, Proceedings of the 35th Conference on Neural Information Processing Systems, (2022)
[44]  
Yang Y, Luo R, Li M, Et al., Mean field multi-agent reinforcement learning, Proceedings of the 35th International Conference on Machine Learning, pp. 5571-5580, (2018)
[45]  
Subramanian S G, Poupart P, Taylor M E, Et al., Multi type mean field reinforcement learning, (2020)
[46]  
Zhang T, Ye Q, Bian J, Et al., MFVFD: A multi-agent Q-learning approach to cooperative and non-cooperative tasks, Proceedings of the 30th International Joint Conference on Artificial Intelligence, pp. 500-506, (2021)
[47]  
Subramanian SG, Taylor M E, Crowley M, Et al., Decentralized mean field games, Proceedings of the AAAI Conference on Artificial Intelligence, pp. 9439-9447, (2022)
[48]  
Ding S, Du W, Ding L, Et al., Multi-agent dueling Q-learning with mean field and value decomposition, Pattern Recognition, 139, (2023)
[49]  
Yang M, Liu G, Zhou Z., Partially observable mean field multi-agent reinforcement learning based on graph-attention, (2023)
[50]  
Foerster J, Assael I A, De Freitas N, Et al., Learning to communicate with deep multi-agent reinforcement learning, Proceedings of the 29th Conference on Neural Information Processing Systems, pp. 2145-2153, (2016)