Distributed Federated Deep Reinforcement Learning Based Trajectory Optimization for Air-Ground Cooperative Emergency Networks

被引：34

作者：

Wu, Silei ^{[1
]}

Xu, Wenjun ^{[1
,2
]}

Wang, Fengyu ^{[1
]}

Li, Guojun ^{[3
]}

Pan, Miao ^{[4
]}

机构：

[1] Beijing Univ Posts & Telecommun BUPT, Key Lab Universal Wireless Commun, Minist Educ, Beijing 100876, Peoples R China

[2] Peng Cheng Lab, Shenzhen, Guangdong, Peoples R China

[3] Chongqing Univ Posts & Telecommun, Chongqing 400065, Peoples R China

[4] Univ Houston, Dept Elect & Comp Engn, Houston, TX 77004 USA

来源：

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY | 2022年 / 71卷 / 08期

基金：

中国国家自然科学基金;

关键词：

Trajectory optimization; Interference; Vehicle dynamics; Convergence; Autonomous aerial vehicles; Atmospheric modeling; Relays; Air-ground cooperative emergency networks; multi-agent; reinforcement learning; trajectory optimization; UAV; COMMUNICATION;

D O I：

10.1109/TVT.2022.3175592

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

The air-ground cooperative emergency networks can assist with the rapid reconstruction of communication in the disaster area, where unmanned aerial vehicles (UAVs) and unmanned ground vehicles (UGVs) are deployed as base stations. The trajectory optimization of emergency base stations is of vital importance to the communication performance, which is related to the timeliness and effectiveness of rescue. In this paper, federated multi-agent deep deterministic policy gradient (F-MADDPG) based trajectory optimization algorithm is proposed to maximize the average spectrum efficiency. Specifically, the property of MADDPG is inherited to jointly control of multiple vehicles and federated averaging (FA) is utilized to eliminate the isolation of data to accelerate the convergence. Distributed F-MADDPG (DF-MADDPG) is further designed to reduce the communication overhead with a distributed architecture. The simulation results indicate that the proposed F-MADDPG and DF-MADDPG based algorithms significantly outperform the existing trajectory optimization algorithms, in terms of the average spectrum efficiency and the speed of convergence.

引用

页码：9107 / 9112

页数：6

共 16 条

[1]

[Anonymous], Release 16

[2]

Bhattacharyya RP, 2018, IEEE INT C INT ROBOT, P1534, DOI 10.1109/IROS.2018.8593758

[3] Interference Management for Cellular-Connected UAVs: A Deep Reinforcement Learning Approach [J].

Challita, Ursula ;

Saad, Walid ;

Bettstetter, Christian .

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2019, 18 (04) :2125-2140

[4] Hierarchical reinforcement learning for self-driving decision-making without reliance on labelled driving data [J].

Duan, Jingliang ;

Eben Li, Shengbo ;

Guan, Yang ;

Sun, Qi ;

Cheng, Bo .

IET INTELLIGENT TRANSPORT SYSTEMS, 2020, 14 (05) :297-305

[5] Trajectory Design and Generalization for UAV Enabled Networks:A Deep Reinforcement Learning Approach [J].

Li, Xuan ;

Wang, Qiang ;

Liu, Jie ;

Zhang, Wenqi .

2020 IEEE WIRELESS COMMUNICATIONS AND NETWORKING CONFERENCE (WCNC), 2020,

[6] Reinforcement Learning in Multiple-UAV Networks: Deployment and Movement Design [J].

Liu, Xiao ;

Liu, Yuanwei ;

Chen, Yue .

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (08) :8036-8049

[7]

Lowe R, 2017, ADV NEUR IN, V30

[8]

McMahan HB, 2017, PR MACH LEARN RES, V54, P1273

[9] Age of Information Aware Trajectory Planning of UAVs in Intelligent Transportation Systems: A Deep Learning Approach [J].

Samir, Moataz ;

Assi, Chadi ;

Sharafeddine, Sanaa ;

Ebrahimi, Dariush ;

Ghrayeb, Ali .

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (11) :12382-12395

[10]

Sutton RS, 2018, ADAPT COMPUT MACH LE, P1

← 1 2 →