Decentralized Trajectory and Power Control Based on Multi-Agent Deep Reinforcement Learning in UAV Networks

被引：10

作者：

Chen, Binqiang ^{[1
]}

Liu, Dong ^{[1
]}

Hanzo, Lajos ^{[2
]}

机构：

[1] Beihang Univ, Beijing, Peoples R China

[2] Unveristy Southampton, Southampton, Hants, England

来源：

IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022) | 2022年

基金：

中国国家自然科学基金;

关键词：

UAV; multi-agent deep reinforcement learning; MADDPG; power allocation; trajectory planning; UNMANNED AERIAL VEHICLES;

D O I：

10.1109/ICC45855.2022.9838637

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

Unmanned aerial vehicles (UAVs) are capable of enhancing the coverage of existing cellular networks by acting as aerial base stations (ABSs). Due to the limited on-board battery capacity and dynamic topology of UAV networks, trajectory planning and interference coordination are crucial for providing satisfactory service, especially in emergency scenarios, where it is unrealistic to control all UAVs in a centralized manner by gathering global user information. Hence, we solve the decentralized joint trajectory and transmit power control problem of multi-UAV ABS networks. Our goal is to maximize the number of satisfied users, while minimizing the overall energy consumption of UAVs. To allow each UAV to adjust its position and transmit power solely based on local-rather the global-observations, a multi-agent reinforcement learning (MARL) framework is conceived. In order to overcome the non-stationarity issue of MARL and to endow the UAVs with distributed decision making capability, we resort to the centralized training in conjunction with decentralized execution paradigm. By judiciously designing the reward, we propose a decentralized joint trajectory and power control (DTPC) algorithm with significantly reduced complexity. Our simulation results show that the proposed DTPC algorithm outperforms the state-of-the-art deep reinforcement learning based methods, despite its low complexity.

引用

页码：3983 / 3988

页数：6

共 18 条

[1] Optimal LAP Altitude for Maximum Coverage [J].