Mean Field Deep Reinforcement Learning for Fair and Efficient UAV Control

被引：60

作者：

Chen, Dezhi ^{[1
,2
]}

Qi, Qi ^{[1
,2
]}

Zhuang, Zirui ^{[1
,2
]}

Wang, Jingyu ^{[1
,2
]}

Liao, Jianxin ^{[1
,2
]}

Han, Zhu ^{[3
,4
]}

机构：

[1] Beijing Univ Posts & Telecommun, State Key Lab Networking & Switching Technol, Beijing 100876, Peoples R China

[2] EBUPT COM, Beijing 100191, Peoples R China

[3] Univ Houston, Dept Elect & Comp Engn, Houston, TX 77004 USA

[4] Kyung Hee Univ, Dept Comp Sci & Engn, Seoul 446701, South Korea

来源：

IEEE INTERNET OF THINGS JOURNAL | 2021年 / 8卷 / 02期

基金：

中国国家自然科学基金;

关键词：

Unmanned aerial vehicles; Internet of Things; Aerospace electronics; Games; Energy consumption; Mathematical model; Reinforcement learning; Mean field; multiagent deep reinforcement learning (DRL); trust region policy optimization (TRPO); unmanned aerial vehicle (UAV); COMMUNICATION; DEPLOYMENT; PLACEMENT; COVERAGE;

D O I：

10.1109/JIOT.2020.3008299

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Unmanned aerial vehicles (UAVs) can provide flexible network coverage services. UAVs can be applied in a large number of scenarios, such as emergency communication and network access in areas without terrestrial network coverage. However, UAVs are limited to relatively short communication range and restricted energy resources. In extreme conditions such as disasters, there may also be a problem that the communication bandwidth is limited and the UAV cannot communicate with the server with a large amount of information, so a decentralized solution is expected. In addition, the interaction between multiple objectives and multiple UAVs leads to a huge state space, which makes large-scale practical applications difficult. To simplify complex interactions, we modeled the UAV control problem with mean-field game (MFG). We propose a new UAV control method, the mean-field trust region policy optimization (MFTRPO), which uses the MFG method to construct the HamiltonJacobiBellman/FokkerPlanckKolmogorov equation that obtains the optimal solution and solves the difficulties in the practical application through the trust region policy optimization and neural network feature embedding methods. The proposed method: 1) maximizes communication efficiency while ensuring fair communication range and network connectivity; 2) fuses the mean-field theory with deep reinforcement learning techniques; and 3) is scalable and adaptive. We conduct extensive simulations for performance evaluation. The simulation results have shown that MFTRPO significantly and consistently outperforms two commonly used baseline methods in terms of coverage, fairness, and energy consumption.

引用

页码：813 / 828

页数：16

共 54 条

[1] Comprehensive Energy Consumption Model for Unmanned Aerial Vehicles, Based on Empirical Studies of Battery Performance [J].

Abeywickrama, Hasini Viranga ;

Jayawickrama, Beeshanga Abewardana ;

He, Ying ;

Dutkiewicz, Eryk .

IEEE ACCESS, 2018, 6 :58383-58394

[2] 3-D Placement of an Unmanned Aerial Vehicle Base Station for Maximum Coverage of Users With Different QoS Requirements [J].

Alzenad, Mohamed ;

El-Keyi, Amr ;

Yanikomeroglu, Halim .

IEEE WIRELESS COMMUNICATIONS LETTERS, 2018, 7 (01) :38-41

[3] Mobility in the Sky: Performance and Mobility Analysis for Cellular-Connected UAVs [J].

Amer, Ramy ;

Saad, Walid ;

Marchetti, Nicola .

IEEE TRANSACTIONS ON COMMUNICATIONS, 2020, 68 (05) :3229-3246

[4]

Basar T., 1998, DYNAMIC NONCOOPERATI

[5] A critical review on unmanned aerial vehicles power supply and energy management: Solutions, strategies, and prospects [J].

Boukoberine, Mohamed Nadir ;

Zhou, Zhibin ;

Benbouzid, Mohamed .

APPLIED ENERGY, 2019, 255

[6] Using Reinforcement Learning to Minimize the Probability of Delay Occurrence in Transportation [J].

Cao, Zhiguang ;

Guo, Hongliang ;

Song, Wen ;

Gao, Kaizhou ;

Chen, Zhenghua ;

Zhang, Le ;

Zhang, Xuexi .

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (03) :2424-2436

[7]

Challita U, 2018, IEEE ICC

[8] Interference Management for Cellular-Connected UAVs: A Deep Reinforcement Learning Approach [J].

Challita, Ursula ;

Saad, Walid ;

Bettstetter, Christian .

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2019, 18 (04) :2125-2140

[9]

Chen M., 2017, P IEEE GLOB COMM C G, P1

[10] Caching in the Sky: Proactive Deployment of Cache-Enabled Unmanned Aerial Vehicles for Optimized Quality-of-Experience [J].

Chen, Mingzhe ;

Mozaffari, Mohammad ;

Saad, Walid ;

Yin, Changchuan ;

Debbah, Merouane ;

Hong, Choong Seon .

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2017, 35 (05) :1046-1061

← 1 2 3 4 5 6 →