Joint UAV trajectory and communication design with heterogeneous multi-agent reinforcement learning

被引：9

作者：

Zhou, Xuanhan ^{[1
]}

Xiong, Jun ^{[1
]}

Zhao, Haitao ^{[1
]}

Liu, Xiaoran ^{[1
]}

Ren, Baoquan ^{[2
]}

Zhang, Xiaochen ^{[1
]}

Wei, Jibo ^{[1
]}

Yin, Hao ^{[2
]}

机构：

[1] Natl Univ Def Technol, Coll Elect Sci & Technol, Changsha 410073, Peoples R China

[2] Acad Mil Sci PLA, Syst Engn Inst, Beijing 100091, Peoples R China

来源：

SCIENCE CHINA-INFORMATION SCIENCES | 2024年 / 67卷 / 03期

基金：

中国国家自然科学基金;

关键词：

unmanned aerial vehicle (UAV); trajectory design; resource allocation; multi-agent deep reinforcement learning (MADRL); heterogeneous agents; ENERGY-EFFICIENT; FAIR COMMUNICATION; NETWORKS; DEPLOYMENT;

D O I：

10.1007/s11432-023-3906-3

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Unmanned aerial vehicles (UAVs) are recognized as effective means for delivering emergency communication services when terrestrial infrastructures are unavailable. This paper investigates a multi-UAV-assisted communication system, where we jointly optimize UAVs' trajectories, user association, and ground users (GUs)' transmit power to maximize a defined fairness-weighted throughput metric. Owing to the dynamic nature of UAVs, this problem has to be solved in real time. However, the problem's non-convex and combinatorial attributes pose challenges for conventional optimization-based algorithms, particularly in scenarios without central controllers. To address this issue, we propose a multi-agent deep reinforcement learning (MADRL) approach to provide distributed and online solutions. In contrast to previous MADRL-based methods considering only UAV agents, we model UAVs and GUs as heterogeneous agents sharing a common objective. Specifically, UAVs are tasked with optimizing their trajectories, while GUs are responsible for selecting a UAV for association and determining a transmit power level. To learn policies for these heterogeneous agents, we design a heterogeneous coordinated QMIX (HC-QMIX) algorithm to train local Q-networks in a centralized manner. With these well-trained local Q-networks, UAVs and GUs can make individual decisions based on their local observations. Extensive simulation results demonstrate that the proposed algorithm outperforms state-of-the-art benchmarks in terms of total throughput and system fairness.

引用

页数：21