Cellular UAV-to-Device Communications: Trajectory Design and Mode Selection by Multi-Agent Deep Reinforcement Learning

被引：60

作者：

Wu, Fanyi ^{[1
]}

Zhang, Hongliang ^{[1
,2
]}

Wu, Jianjun ^{[1
]}

Song, Lingyang ^{[1
]}

机构：

[1] Peking Univ, Dept Elect Engn, Beijing 100871, Peoples R China

[2] Univ Houston, Dept Elect & Comp Engn, Houston, TX 77004 USA

来源：

IEEE TRANSACTIONS ON COMMUNICATIONS | 2020年 / 68卷 / 07期

基金：

中国国家自然科学基金;

关键词：

Sensors; Mobile handsets; Trajectory; Internet; Quality of service; Cellular networks; Machine learning; UAV-to-Device communications; cellular Internet of UAVs; trajectory design; deep reinforcement learning; OPTIMIZATION; NETWORKS; INTERNET;

D O I：

10.1109/TCOMM.2020.2986289

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In the current unmanned aircraft systems (UASs) for sensing services, unmanned aerial vehicles (UAVs) transmit their sensory data to terrestrial mobile devices over the unlicensed spectrum. However, the interference from surrounding terminals is uncontrollable due to the opportunistic channel access. In this paper, we consider a cellular Internet of UAVs to guarantee the Quality-of-Service (QoS), where the sensory data can be transmitted to the mobile devices either by UAV-to-Device (U2D) communications over cellular networks, or directly through the base station (BS). Since UAVs' sensing and transmission may influence their trajectories, we study the trajectory design problem for UAVs in consideration of their sensing and transmission. This is a Markov decision problem (MDP) with a large state-action space, and thus, we utilize multi-agent deep reinforcement learning (DRL) to approximate the state-action space, and then propose a multi-UAV trajectory design algorithm to solve this problem. Simulation results show that our proposed algorithm can achieve a higher total utility than policy gradient algorithm and single-agent algorithm.

引用

页码：4175 / 4189

页数：15

共 41 条

[1]

Alsaif Bidoor, 2017, 2017 Conference on Lasers and Electro-Optics Europe & European Quantum Electronics Conference (CLEO/Europe-EQEC), DOI [10.1109/CLEOE-EQEC.2017.8087487, 10.17660/ActaHortic.2017.1164.1]

[2]

[Anonymous], 2017, document 3GPP TR 36.777 V15.0.0

[3]

[Anonymous], 2013, IEEE AEROSPACE C P N, DOI DOI 10.1109/AERO.2013.6496959

[4]

[Anonymous], 1998, INTRO REINFORCEMENT

[5] Deep Reinforcement Learning A brief survey [J].

Arulkumaran, Kai ;

Deisenroth, Marc Peter ;

Brundage, Miles ;

Bharath, Anil Anthony .

IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38

[6] Spatial Configuration of Agile Wireless Networks With Drone-BSs and User-in-the-loop [J].

Bor-Yaliniz, Irem ;

El-Keyi, Amr ;

Yanikomeroglu, Halim .

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2019, 18 (02) :753-768

[7] Environment-Aware Drone-Base-Station Placements in Modern Metropolitans [J].

Bor-Yaliniz, Irem ;

Szyszkowicz, Ebastian S. ;

Yanikomeroglu, Halim .

IEEE WIRELESS COMMUNICATIONS LETTERS, 2018, 7 (03) :372-375

[8] On Network Lifetime Expectancy With Realistic Sensing and Traffic Generation Model in Wireless Sensor Networks [J].

Chakraborty, Ayon ;

Rout, Rashmi Ranjan ;

Chakrabarti, Aveek ;

Ghosh, Soumya K. .

IEEE SENSORS JOURNAL, 2013, 13 (07) :2771-2779

[9] Interference Management for Cellular-Connected UAVs: A Deep Reinforcement Learning Approach [J].

Challita, Ursula ;

Saad, Walid ;

Bettstetter, Christian .

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2019, 18 (04) :2125-2140

[10] Liquid State Machine Learning for Resource and Cache Management in LTE-U Unmanned Aerial Vehicle (UAV) Networks [J].

Chen, Mingzhe ;

Saad, Walid ;

Yin, Changchuan .

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2019, 18 (03) :1504-1517

← 1 2 3 4 5 →