UAV Control for Wireless Service Provisioning in Critical Demand Areas: A Deep Reinforcement Learning Approach

被引：32

作者：

Ho, Tai Manh ^{[1
]}

Kim-Khoa Nguyen ^{[1
]}

Cheriet, Mohamed ^{[1
]}

机构：

[1] Univ Quebec, Synchromedia Lab, Ecole Technol Super, Montreal, PQ H3C 1K3, Canada

来源：

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY | 2021年 / 70卷 / 07期

关键词：

Energy consumption; Wireless communication; Propulsion; Trajectory; Optimization; Unmanned aerial vehicles; Vehicle dynamics; 5; G; airbone; UAV control; deep reinforcement learning; trust region policy optimization; ENERGY-EFFICIENT; FAIR COMMUNICATION; TRAJECTORY DESIGN; ALLOCATION; NETWORKS;

D O I：

10.1109/TVT.2021.3088129

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, we investigate the problem of wireless service provisioning through a rotary-wing UAV which can serve as an aerial base station (BS) to communicate with multiple ground terminals (GTs) in a boost demand area. Our objective is to optimize the UAV control for maximizing the UAV.s energy efficiency, in which both aerodynamic energy and communication energy are considered while ensuring the communication requirements for each GT and backhaul link between the UAV and the terrestrial BS. The mobility of the UAV and GTs lead to time-varying channel conditions that make the environment dynamic. We formulate a nonconvex optimization for controlling the UAV considering the practical angle-dependent Rician fading channels between the UAV and GTs, and between the UAV and the terrestrial BS. Traditional optimization approaches are not able to handle the dynamic environment and high complexity of the problem in real-time. We propose to use a deep reinforcement learning-based approach namely Deep Deterministic Policy Gradient (DDPG) to solve the formulated nonconvex problem of UAV control with continuous action space that takes into account the real-time of the environment including time-varying UAV-ground channel conditions, available onboard energy of the UAV, and the communication requirement of the GTs. However, the DDPG method may not achieve good performance in an unstable environment and will face a large number of hyperparameters. We extend our approach to use the Trust Region Policy Optimization (TRPO) method that can improve the performance of the UAV compared to the DDPG method in such a dynamic environment.

引用

页码：7138 / 7152

页数：15

共 39 条

[1]

Abadi M, 2016, ACM SIGPLAN NOTICES, V51, P1, DOI [10.1145/3022670.2976746, 10.1145/2951913.2976746]

[2] Radio Channel Modeling for UAV Communication Over Cellular Networks [J].

Amorim, Rafhael ;

Huan Nguyen ;

Mogensen, Preben ;

Kovacs, Istvan Z. ;

Wigard, Jeroen ;

Sorensen, Troels B. .

IEEE WIRELESS COMMUNICATIONS LETTERS, 2017, 6 (04) :514-517

[3]

[Anonymous], 2014, P 11 INT C MOBILE UB

[4] Ultra Reliable UAV Communication Using Altitude and Cooperation Diversity [J].

Azari, Mohammad Mahdi ;

Rosas, Fernando ;

Chen, Kwang-Cheng ;

Pollin, Sofie .

IEEE TRANSACTIONS ON COMMUNICATIONS, 2018, 66 (01) :330-344

[5] Interference Management for Cellular-Connected UAVs: A Deep Reinforcement Learning Approach [J].

Challita, Ursula ;

Saad, Walid ;

Bettstetter, Christian .

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2019, 18 (04) :2125-2140

[6] 3D UAV Trajectory Design and Frequency Band Allocation for Energy-Efficient and Fair Communication: A Deep Reinforcement Learning Approach [J].

Ding, Ruijin ;

Gao, Feifei ;

Shen, Xuemin .

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2020, 19 (12) :7796-7809

[7] Learning-Aided Realtime Performance Optimisation of Cognitive UAV-Assisted Disaster Communication [J].

Duong, Trung Q. ;

Nguyen, Long D. ;

Hoang Duong Tuan ;

Hanzo, Lajos .

2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2019,

[8] An Introduction to Deep Reinforcement Learning [J].

Francois-Lavet, Vincent ;

Henderson, Peter ;

Islam, Riashat ;

Bellemare, Marc G. ;

Pineau, Joelle .

FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2018, 11 (3-4) :219-354

[9]

Henderson P, 2018, AAAI CONF ARTIF INTE, P3207

[10] Constrained Deep Reinforcement Learning for Energy Sustainable Multi-UAV Based Random Access IoT Networks With NOMA [J].

Khairy, Sami ;

Balaprakash, Prasanna ;

Cai, Lin X. ;

Cheng, Yu .

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2021, 39 (04) :1101-1115

← 1 2 3 4 →