Trajectory and Communication Design for Cache- Enabled UAVs in Cellular Networks: A Deep Reinforcement Learning Approach

被引:31
作者
Ji, Jiequ [1 ]
Zhu, Kun [1 ]
Cai, Lin [2 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Peoples R China
[2] Univ Victoria, Dept Elect & Comp Engn, Victoria, BC V8W 3P6, Canada
基金
中国国家自然科学基金;
关键词
Trajectory; Delays; Autonomous aerial vehicles; Optimization; Cellular networks; Reinforcement learning; Communication system security; Unmanned aerial vehicle; edge caching; trajectory design; cache placement; reinforcement learning; PERFORMANCE ANALYSIS; DEPLOYMENT; SECURE; OPTIMIZATION; EDGE;
D O I
10.1109/TMC.2022.3181308
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we investigate the content transmission in a heavy-crowded multiple access cellular network, whose data traffic is offloaded through the combination of edge caching and unmanned aerial vehicle (UAV) communication. In this context, we formulate a novel optimization problem, which minimizes the sum content acquisition delay of users by optimizing the multiuser association and cache placement jointly with UAV trajectory and transmission power over a given flight duration. However, due to the uncertainty of the environment (e.g., random content requests and dynamic UAV positions), it is often difficult and impractical to solve the formulated problem using conventional optimization methods. To this end, we model our problem as a partially observable stochastic game where the macro base station (MBS) and UAVs act as agents to collectively interact with the environment to receive distinctive observations. Moreover, we take advantage of the Proximal Policy Optimization (PPO) learning strategy and propose a novel Dual-Clip PPO-based algorithm to solve the converted problem. To guide agent exploration, a new exploration criterion is proposed in which each UAV agent can obtain an intrinsic reward when it explores beyond the boundary of explored regions (BeBold). Note that the MBS agent has the extrinsic reward given by the environment only. Numerical results reveal that the proposed algorithm outperforms the standard PPO-based deep reinforcement learning algorithm. Moreover, the proposed joint design scheme can achieve a dramatic reduction of content acquisition delay compared with the benchmark schemes.
引用
收藏
页码:6190 / 6204
页数:15
相关论文
共 48 条
[1]  
3rd Generation Partnership Project (3GPP), 2017, TR, V36, P814
[2]   UAV-Assisted Content Delivery in Intelligent Transportation Systems-Joint Trajectory Planning and Cache Management [J].
Al-Hilo, Ahmed ;
Samir, Moataz ;
Assi, Chadi ;
Sharafeddine, Sanaa ;
Ebrahimi, Dariush .
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2021, 22 (08) :5155-5167
[3]  
Al-Hourani A, 2014, IEEE GLOB COMM CONF, P2898, DOI 10.1109/GLOCOM.2014.7037248
[4]  
[Anonymous], 2017, document 3GPP TR 36.777
[5]  
[Anonymous], 2012, 3GPP TR 36.828
[6]   Living on the Edge: The Role of Proactive Caching in 5G Wireless Networks [J].
Bastug, Ejder ;
Bennis, Mehdi ;
Debbah, Merouane .
IEEE COMMUNICATIONS MAGAZINE, 2014, 52 (08) :82-89
[7]  
Bohn E, 2019, INT CONF UNMAN AIRCR, P523, DOI [10.1109/ICUAS.2019.8798254, 10.1109/icuas.2019.8798254]
[8]  
Breslau L, 1999, IEEE INFOCOM SER, P126, DOI 10.1109/INFCOM.1999.749260
[9]   User Measures of Quality of Experience: Why Being Objective and Quantitative Is Important [J].
Brooks, Peter ;
Hestnes, Bjorn .
IEEE NETWORK, 2010, 24 (02) :8-13
[10]   Dual-UAV-Enabled Secure Communications: Joint Trajectory Design and User Scheduling [J].
Cai, Yunlong ;
Cui, Fangyu ;
Shi, Qingjiang ;
Zhao, Minjian ;
Li, Geoffrey Ye .
IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2018, 36 (09) :1972-1985