Wilderness Search and Rescue Missions using Deep Reinforcement Learning

被引：0

作者：

Peake, Ashley ^{[1
]}

McCalmon, Joe ^{[1
]}

Zhang, Yixin ^{[1
]}

Raiford, Benjamin ^{[1
]}

Alqahtani, Sarra ^{[1
]}

机构：

[1] Wake Forest Univ, Comp Sci Dept, Winston Salem, NC 27101 USA

来源：

2020 IEEE INTERNATIONAL SYMPOSIUM ON SAFETY, SECURITY, AND RESCUE ROBOTICS (SSRR 2020) | 2020年

关键词：

Navigation; search; rescue; deep reinforcement learning; LSTM; UAVs;

D O I：

10.1109/ssrr50563.2020.9292613

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Wilderness Search and Rescue (WiSAR) requires navigating large regions - often in rugged, remote areas - searching for missing people or animals. Because of the large regions and potentially limited mobility of ground vehicles, WiSAR missions are frequently carried out with the help of Unmanned Aerial Vehicles (UAVs). However, the ability to autonomously execute WiSAR remains an unsolved challenge. In this paper, we take advantage of Deep Reinforcement Learning (DRL) to develop an autonomous WiSAR controller for UAVs. We improve the learning and understanding of a UAV agent to explore a partially observable environment in search of a victim trapped in the wild. The proposed approach breaks up this difficult problem into 4 sub-tasks: tractable mapping of the environment in small regions, region selection, target search, and region exploration. Quad-Tree is utilized offline to decompose the environment map into smaller, tractable maps. Then, an efficient cost function is repeatedly computed to determine the best target region to search in each iteration of the process. Recurrent-DDQN and A2C algorithms are trained to generate optimal policies for the target search and regions exploration tasks, respectively. We tested our approach against a baseline of a hard-coded policy of navigating the map in a zigzag fashion and another baseline of using the same sub-tasks but instead of using the DRL algorithms, randomly selecting an action at each time step. The results demonstrate that our proposed approach is capable of navigating through 25 randomly generated environments and finding the missing victim faster than the baselines by 46%.

引用

页码：102 / 107

页数：6

共 19 条

[1] Adams S.M., 2011, A survey of unmanned aerial vehicle (UAV) usage for imagery collection in disaster research and management
[2] Akc S., 2019, ABS191205684 ARXIV
[3] Alqahtani S, 2018, INT CONF UNMAN AIRCR, P74, DOI 10.1109/ICUAS.2018.8453382
[4] Bayerlein H, 2018, IEEE INT WORK SIGN P, P945
[5] Autonomous Control of Unmanned Aerial Vehicles
Becerra, Victor M.
[J]. ELECTRONICS, 2019, 8 (04)
[6] Bircher A, 2016, IEEE INT CONF ROBOT, P1462, DOI 10.1109/ICRA.2016.7487281
[7] Burda Yuri, 2018, Large-scale study of curiosity-driven learning
[8] Gandhi D, 2017, IEEE INT C INT ROBOT, P3948, DOI 10.1109/IROS.2017.8206247
[9] Hausknecht M, 2015, AAAI FALL S SEQUENTI
[10] Pham HX, 2018, IEEE INT SYMP SAFE

← 1 2 →