Multiagent Q-learning based UAV trajectory planning for effective situational awareness

被引：7

作者：

Akin, Erdal ^{[1
]}

Demir, Kubilay ^{[2
]}

Yetgin, Halil ^{[2
,3
]}

机构：

[1] Bitlis Eren Univ, Dept Comp Engn, Bitlis, Turkey

[2] Bitlis Eren Univ, Dept Elect & Elect Engn, Bitlis, Turkey

[3] Jozef Stefan Inst, Dept Commun Syst, Ljubljana, Slovenia

来源：

TURKISH JOURNAL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCES | 2021年 / 29卷 / 05期

关键词：

Reinforcement learning; postdisaster recovery; trajectory planning; flying ad-hoc networks; NETWORKS; DESIGN;

D O I：

10.3906/elk-2012-41

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In the event of a natural disaster, arrival time of the search and rescue (SAR) teams to the affected areas is of vital importance to save the life of the victims. In particular, when an earthquake occurs in a geographically large area, reconnaissance of the debris within a short-time is critical for conducting successful SAR missions. An effective and quick situational awareness in postdisaster scenarios can be provided via the help of unmanned aerial vehicles (UAVs). However, off-the-shelf UAVs suffer from the limited communication range as well as the limited airborne duration due to battery constraints. If telecommunication infrastructure is destroyed in such a disaster, maximum coverage to be monitored by a ground station (GS) using UAVs is limited to a single UAV's wireless coverage regardless of how many UAVs are deployed. Additionally, performing a blind search within the affected area could induce significant delays in SAR missions and thus leading to inefficient use of the limited battery energy. To address these issues, we develop a multiagent Q-learning based trajectory planning algorithm that maintains all-time connectivity towards the GS in a multihop manner and enables UAVs to observe as many critical areas (highly populated areas) as possible. The comprehensive experimental results demonstrate that the proposed multiagent Q-learning algorithm is capable of attaining UAV trajectories that can cover significantly larger portions of the critical areas summing up to 43% than that of the existing algorithms, such as the extended versions of Monte Carlo, greedy and random algorithms.

引用

页码：2561 / 2579

页数：19

共 23 条

[1]

[Anonymous], 1989, LEARNING DELAYED REW

[2] Flying Ad-Hoc Networks (FANETs): A survey [J].