Multiagent Q-learning based UAV trajectory planning for effective situational awareness

被引:7
作者
Akin, Erdal [1 ]
Demir, Kubilay [2 ]
Yetgin, Halil [2 ,3 ]
机构
[1] Bitlis Eren Univ, Dept Comp Engn, Bitlis, Turkey
[2] Bitlis Eren Univ, Dept Elect & Elect Engn, Bitlis, Turkey
[3] Jozef Stefan Inst, Dept Commun Syst, Ljubljana, Slovenia
关键词
Reinforcement learning; postdisaster recovery; trajectory planning; flying ad-hoc networks; NETWORKS; DESIGN;
D O I
10.3906/elk-2012-41
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the event of a natural disaster, arrival time of the search and rescue (SAR) teams to the affected areas is of vital importance to save the life of the victims. In particular, when an earthquake occurs in a geographically large area, reconnaissance of the debris within a short-time is critical for conducting successful SAR missions. An effective and quick situational awareness in postdisaster scenarios can be provided via the help of unmanned aerial vehicles (UAVs). However, off-the-shelf UAVs suffer from the limited communication range as well as the limited airborne duration due to battery constraints. If telecommunication infrastructure is destroyed in such a disaster, maximum coverage to be monitored by a ground station (GS) using UAVs is limited to a single UAV's wireless coverage regardless of how many UAVs are deployed. Additionally, performing a blind search within the affected area could induce significant delays in SAR missions and thus leading to inefficient use of the limited battery energy. To address these issues, we develop a multiagent Q-learning based trajectory planning algorithm that maintains all-time connectivity towards the GS in a multihop manner and enables UAVs to observe as many critical areas (highly populated areas) as possible. The comprehensive experimental results demonstrate that the proposed multiagent Q-learning algorithm is capable of attaining UAV trajectories that can cover significantly larger portions of the critical areas summing up to 43% than that of the existing algorithms, such as the extended versions of Monte Carlo, greedy and random algorithms.
引用
收藏
页码:2561 / 2579
页数:19
相关论文
共 23 条
[1]  
[Anonymous], 1989, LEARNING DELAYED REW
[2]   Flying Ad-Hoc Networks (FANETs): A survey [J].
Bekmezci, Ilker ;
Sahingoz, Ozgur Koray ;
Temel, Samil .
AD HOC NETWORKS, 2013, 11 (03) :1254-1270
[3]   Autonomous Tracking Using a Swarm of UAVs: A Constrained Multi-Agent Reinforcement Learning Approach [J].
Chen, Yu-Jia ;
Chang, Deng-Kai ;
Zhang, Cheng .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2020, 69 (11) :13702-13717
[4]  
Chowdhury MMU, 2019, IEEE MILIT COMMUN C, DOI [10.1109/MILCOM47813.2019.9020894, 10.1109/milcom47813.2019.9020894]
[5]   Adaptive UAV-Trajectory Optimization Under Quality of Service Constraints: A Model-Free Solution [J].
Cui, Jingjing ;
Ding, Zhiguo ;
Deng, Yansha ;
Nallanathan, Arumugam ;
Hanzo, Lajos .
IEEE ACCESS, 2020, 8 :112253-112265
[6]   Modified special HSS method for discrete ill-posed problems and image restoration [J].
Cui, Jingjing ;
Peng, Guohua ;
Lu, Quan ;
Huang, Zhengge .
INTERNATIONAL JOURNAL OF COMPUTER MATHEMATICS, 2020, 97 (04) :739-758
[7]   Multi-Agent Reinforcement Learning-Based Resource Allocation for UAV Networks [J].
Cui, Jingjing ;
Liu, Yuanwei ;
Nallanathan, Arumugam .
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2020, 19 (02) :729-743
[8]   Autonomous UAV Trajectory for Localizing Ground Objects: A Reinforcement Learning Approach [J].
Ebrahimi, Dariush ;
Sharafeddine, Sanaa ;
Ho, Pin-Han ;
Assi, Chadi .
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2021, 20 (04) :1312-1324
[9]  
Erdelj M, 2017, IEEE PERVAS COMPUT, V16, P24, DOI 10.1109/MPRV.2017.11
[10]   Multi-objective drone path planning for search and rescue with quality-of-service requirements [J].
Hayat, Samira ;
Yanmaz, Evsen ;
Bettstetter, Christian ;
Brown, Timothy X. .
AUTONOMOUS ROBOTS, 2020, 44 (07) :1183-1198