Drone-Cell Trajectory Planning and Resource Allocation for Highly Mobile Networks: A Hierarchical DRL Approach

被引:53
|
作者
Shi, Weisen [1 ]
Li, Junling [1 ,2 ]
Wu, Huaqing [1 ]
Zhou, Conghao [1 ]
Cheng, Nan [3 ]
Shen, Xuemin [1 ]
机构
[1] Univ Waterloo, Dept Elect & Comp Engn, Waterloo, ON N2L 3G1, Canada
[2] Shenzhen Inst Artificial Intelligence & Robot Soc, Shenzhen, Guangdong, Peoples R China
[3] Xidian Univ, Sch Telecommun, Xian 710071, Peoples R China
来源
IEEE INTERNET OF THINGS JOURNAL | 2021年 / 8卷 / 12期
基金
加拿大自然科学与工程研究理事会; 中国国家自然科学基金;
关键词
Trajectory; Planning; Resource management; Throughput; Internet of Things; Radio access networks; Real-time systems; Drone cell; drone-assisted radio access network (RAN); space-air-ground integration; trajectory planning; VEHICULAR NETWORKS; DESIGN; 5G; UAVS;
D O I
10.1109/JIOT.2020.3020067
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Drone cell (DC) is envisioned to enable the dynamic service provisioning for radio access networks (RANs), in response to the spatial and temporal unevenness of user traffic. In this article, we propose a hierarchical deep reinforcement learning (DRL)-based multi-DC trajectory planning and resource allocation (HDRLTPRA) scheme for high-mobility users. The objective is to maximize the accumulative network throughput while satisfying user fairness, DC power consumption, and DC-to-ground link quality constraints. To address the high uncertainties of the environment, we decouple the multi-DC TPRA problem into two hierarchical subproblems, i.e., the higher level global trajectory planning (GTP) subproblem and the lower level local TPRA (LTPRA) subproblem. First, the GTP subproblem is to address trajectory planning for multiple DCs in the RAN over a long time period. To solve the subproblem, we propose a multiagent DRL-based GTP (MARL-GTP) algorithm in which the nonstationary state space caused by the multi-DC environment is addressed by the multiagent fingerprint technique. Second, based on the GTP results, each DC solves the LTPRA subproblem independently to control the movement and transmit power allocation based on the real-time user traffic variations. A deep deterministic policy gradient (DEP)-based LTPRA (DEP-LTPRA) algorithm is then proposed to solve the LTPRA subproblem. With the two algorithms addressing both subproblems at different decision granularities, the multi-DC TPRA problem can be resolved by the HDRLTPRA scheme. Simulation results show that 40% network throughput improvement can be achieved by the proposed HDRLTPRA scheme over the nonlearning-based TPRA scheme.
引用
收藏
页码:9800 / 9813
页数:14
相关论文
共 23 条
  • [21] Toward Optimal Resource Allocation: A Multi-Agent DRL Based Task Offloading Approach in Multi-UAV-Assisted MEC Networks
    Tariq, Muhammad Naqqash
    Wang, Jingyu
    Raza, Salman
    Siraj, Mohammad
    Altamimi, Majid
    Memon, Saifullah
    IEEE ACCESS, 2024, 12 : 81428 - 81440
  • [22] A Hybrid Secure Resource Allocation and Trajectory Optimization Approach for Mobile Edge Computing Using Federated Learning Based on WEB 3.0
    Consul, Prakhar
    Budhiraja, Ishan
    Garg, Deepak
    IEEE TRANSACTIONS ON CONSUMER ELECTRONICS, 2024, 70 (01) : 1167 - 1179
  • [23] Distributed Resource Allocation for Self-Organizing Small Cell Networks: An Evolutionary Game Approach
    Semasinghe, Prabodini
    Zhu, Kun
    Hossain, Ekram
    2013 IEEE GLOBECOM WORKSHOPS (GC WKSHPS), 2013, : 702 - 707