Cooperative Internet of UAVs: Distributed Trajectory Design by Multi-Agent Deep Reinforcement Learning

被引：135

作者：

Hu, Jingzhi ^{[1
]}

Zhang, Hongliang ^{[1
,2
]}

Song, Lingyang ^{[1
]}

Schober, Robert ^{[3
]}

Poor, H. Vincent ^{[2
]}

机构：

[1] Peking Univ, Dept Elect, Beijing 100871, Peoples R China

[2] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA

[3] Friedrich Alexander Univ Erlangen Nuremberg, Inst Digital Commun, D-91058 Erlangen, Germany

来源：

IEEE TRANSACTIONS ON COMMUNICATIONS | 2020年 / 68卷 / 11期

基金：

中国国家自然科学基金; 美国国家科学基金会;

关键词：

Sensors; Task analysis; Trajectory; Internet; Machine learning; Protocols; Electronic mail; Cooperative Internet of UAVs; distributed trajectory design; deep reinforcement learning; AERIAL VEHICLE NETWORKS; CELLULAR INTERNET; OPTIMIZATION;

D O I：

10.1109/TCOMM.2020.3013599

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Due to the advantages of flexible deployment and extensive coverage, unmanned aerial vehicles (UAVs) have significant potential for sensing applications in the next generation of cellular networks, which will give rise to a cellular Internet of UAVs. In this article, we consider a cellular Internet of UAVs, where the UAVs execute sensing tasks through cooperative sensing and transmission to minimize the age of information (AoI). However, the cooperative sensing and transmission is tightly coupled with the UAVs' trajectories, which makes the trajectory design challenging. To tackle this challenge, we propose a distributed sense-and-send protocol, where the UAVs determine the trajectories by selecting from a discrete set of tasks and a continuous set of locations for sensing and transmission. Based on this protocol, we formulate the trajectory design problem for AoI minimization and propose a compound-action actor-critic (CA2C) algorithm to solve it based on deep reinforcement learning. The CA2C algorithm can learn the optimal policies for actions involving both continuous and discrete variables and is suited for the trajectory design. Our simulation results show that the CA2C algorithm outperforms four baseline algorithms. Also, we show that by dividing the tasks, cooperative UAVs can achieve a lower AoI compared to non-cooperative UAVs.

引用

页码：6807 / 6821

页数：15

共 42 条

[1]

[Anonymous], 2017, 36777 TR 3GPP, V36, P777

[2]

[Anonymous], 2016, PROC INT C LEARNING

[3]

[Anonymous], 2003, THESIS

[4] Deep Reinforcement Learning A brief survey [J].

Arulkumaran, Kai ;

Deisenroth, Marc Peter ;

Brundage, Miles ;

Bharath, Anil Anthony .

IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) :26-38

[5] Regret Based Learning for UAV assisted LTE-U/WiFi Public Safety Networks [J].

Athukoralage, Dasun ;

Guvenc, Ismail ;

Saad, Walid ;

Bennis, Mehdi .

2016 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2016,

[6] DYNAMIC PROGRAMMING TREATMENT OF TRAVELLING SALESMAN PROBLEM [J].

BELLMAN, R .

JOURNAL OF THE ACM, 1962, 9 (01) :61-&

[7] Cooperative forest fire surveillance using a team of small unmanned air vehicles [J].

Casbeer, David W. ;

Kingston, Derek B. ;

Beard, Randal W. ;

McLain, Timothy W. .

INTERNATIONAL JOURNAL OF SYSTEMS SCIENCE, 2006, 37 (06) :351-360

[8] Survey on UAV Cellular Communications: Practical Aspects, Standardization Advancements, Regulation, and Security Challenges [J].

Fotouhi, Azade ;

Qiang, Haoran ;

Ding, Ming ;

Hassan, Mahbub ;

Giordano, Lorenzo Galati ;

Garcia-Rodriguez, Adrian ;

Yuan, Jinhong .

IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2019, 21 (04) :3417-3442

[9]

Goodfellow I, 2016, ADAPT COMPUT MACH LE, P1

[10] Survey of Important Issues in UAV Communication Networks [J].

Gupta, Lav ;

Jain, Raj ;

Vaszkun, Gabor .

IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2016, 18 (02) :1123-1152

← 1 2 3 4 5 →