Autonomous decision-making of UAV cluster with communication constraints based on reinforcement learning

被引:0
作者
Zhang, Ting-Ting [1 ,5 ]
Chen, Yan [1 ]
Dong, Ren-zhi [2 ]
Chen, Tao [3 ]
Liu, Yan [4 ]
Zhang, Kai-Ge [4 ]
Song, Ai-Guo [5 ]
Lan, Yu-Shi [6 ]
机构
[1] Army Engn Univ PLA, Coll Command Control Engn, Nanjing, Jiangsu, Peoples R China
[2] Cetccloud Beijing Technol Co, Nanjing, Jiangsu, Peoples R China
[3] Natl Univ Def Technol, Changsha, Hunan, Peoples R China
[4] North Automat Control Technol Inst, Taiyuan, Shanxi, Peoples R China
[5] Southeast Univ, Nanjing, Jiangsu, Peoples R China
[6] Nanjing Res Inst Elect Engn, Nanjing, Jiangsu, Peoples R China
来源
JOURNAL OF CLOUD COMPUTING-ADVANCES SYSTEMS AND APPLICATIONS | 2025年 / 14卷 / 01期
基金
中国博士后科学基金;
关键词
UAV cluster; Autonomous decision-making; Communication constraint; Reinforcement learning;
D O I
10.1186/s13677-025-00738-9
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Artificial intelligence techniques are increasingly applied in the study of autonomous decision-making in unmanned clustered distributed systems. However, communication constraints has become a big bottleneck that restricts its performance. To address the need for unmanned aerial vehicles(UAVs) to execute collaborative attack missions in complex communication-constrained environments, this paper propose an autonomous decision-making method for UAVs based on Multi-Agent Reinforcement Learning (MARL). Firstly, the autonomous decision-making processes of UAV clusters are modeled as Decentralized Partially Observable Markov Decision Processes(Dec-POMDPs). Next, the algorithm is enhanced within the framework of Multi-Agent Deep Deterministic Policy Gradient(MADDPG) by designing an explicit inter-intelligent communication mechanism to achieve information exchange among UAVs. Subsequently, the algorithm utilizes Long Short-Term Memory(LSTM) networks to process the local observations of the UAVs, enhancing the effectiveness of the information sent by combining historical data with current observations. Finally, multiple rounds of experiments are conducted across various communication-constrained scenarios. Simulation results indicate that the proposed method improves the task completion capability by 46.0% and enhances stability by 24.9% compared to baseline algorithm MADDPG. Additionally, the algorithm demonstrates better generalization and exhibits good scalability, effectively adapting to varying numbers of UAVs. This research provides new theoretical insights and a technical framework for the collaboration of UAVs in environments with communication constraints, which holds great practical importance in improving the ability and application scope of UAV systems.
引用
收藏
页数:14
相关论文
共 33 条
  • [1] Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A.N., Kaiser L., Attention is all you need, Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6000-6010, (2017)
  • [2] Hv H., Guez A., Silver D., Deep reinforcement learning with double Q-learning, The Thirtieth AAAI Conference on Artificial Intelligence (AAAI 2016), pp. 2094-2100, (2016)
  • [3] Chen T., Yang Q., Chen Y., Jump-NERF: An approach to removing glare and pseudo shadows caused by glass in architectural spaces, China Automation Congress, Chongqing, pp. 8365-8370, (2023)
  • [4] Long J., Shelhamer E., Darrell T., Fully convolutional networks for semantic segmentation, Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 3431-3440, (2015)
  • [5] Zhou J., Cui G., Hu S., Zhang Z., Yang C., Liu Z., Wang L., Li C., Sun M., Graph neural networks: a review of methods and applications, AI Open, 1, pp. 57-81, (2020)
  • [6] Wu Z., Shen C., van den Hengel A., Wider or deeper: revisiting the ResNet model for visual recognition, Pattern Recogn, 90, pp. 119-133, (2019)
  • [7] Zhang T., Chai L., Wang S., Jin J., Liu X., Song A., Lan Y., Improving autonomous behavior strategy learning in an unmanned swarm system through knowledge enhancement, IEEE Trans Reliab, 71, 2, pp. 763-774, (2022)
  • [8] Zhang T., Lan Y., Song A., A review of autonomous cooperative technologies for unmanned cluster systems, J Command Control, 7, 2, pp. 127-136, (2021)
  • [9] Zhang Y., Wu F., Wang M., Duan H., Zhang Z., Wang H., Deep reinforcement learning-based autonomous behavioral decision making for unmanned combat vehicles, Fire Control Command Control, 46, 4, pp. 72-77, (2021)
  • [10] Wang H., Bai H., Li F., Et al., Throughput maximization for covert UAV relaying system, IEEE Trans Veh Technol, 73, 3, pp. 4429-4434, (2024)