Joint UAV trajectory and communication design with heterogeneous multi-agent reinforcement learning

被引:9
作者
Zhou, Xuanhan [1 ]
Xiong, Jun [1 ]
Zhao, Haitao [1 ]
Liu, Xiaoran [1 ]
Ren, Baoquan [2 ]
Zhang, Xiaochen [1 ]
Wei, Jibo [1 ]
Yin, Hao [2 ]
机构
[1] Natl Univ Def Technol, Coll Elect Sci & Technol, Changsha 410073, Peoples R China
[2] Acad Mil Sci PLA, Syst Engn Inst, Beijing 100091, Peoples R China
基金
中国国家自然科学基金;
关键词
unmanned aerial vehicle (UAV); trajectory design; resource allocation; multi-agent deep reinforcement learning (MADRL); heterogeneous agents; ENERGY-EFFICIENT; FAIR COMMUNICATION; NETWORKS; DEPLOYMENT;
D O I
10.1007/s11432-023-3906-3
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Unmanned aerial vehicles (UAVs) are recognized as effective means for delivering emergency communication services when terrestrial infrastructures are unavailable. This paper investigates a multi-UAV-assisted communication system, where we jointly optimize UAVs' trajectories, user association, and ground users (GUs)' transmit power to maximize a defined fairness-weighted throughput metric. Owing to the dynamic nature of UAVs, this problem has to be solved in real time. However, the problem's non-convex and combinatorial attributes pose challenges for conventional optimization-based algorithms, particularly in scenarios without central controllers. To address this issue, we propose a multi-agent deep reinforcement learning (MADRL) approach to provide distributed and online solutions. In contrast to previous MADRL-based methods considering only UAV agents, we model UAVs and GUs as heterogeneous agents sharing a common objective. Specifically, UAVs are tasked with optimizing their trajectories, while GUs are responsible for selecting a UAV for association and determining a transmit power level. To learn policies for these heterogeneous agents, we design a heterogeneous coordinated QMIX (HC-QMIX) algorithm to train local Q-networks in a centralized manner. With these well-trained local Q-networks, UAVs and GUs can make individual decisions based on their local observations. Extensive simulation results demonstrate that the proposed algorithm outperforms state-of-the-art benchmarks in terms of total throughput and system fairness.
引用
收藏
页数:21
相关论文
empty
未找到相关数据