AdverSAR: Adversarial Search and Rescue via Multi-Agent Reinforcement Learning

被引：4

作者：

Rahman, Aowabin ^{[1
]}

Bhattacharya, Arnab ^{[1
]}

Ramachandran, Thiagarajan ^{[1
]}

Mukherjee, Sayak ^{[1
]}

Sharma, Himanshu ^{[1
]}

Fujimoto, Ted ^{[2
]}

Chatterjee, Samrat ^{[3
]}

机构：

[1] Pacific Northwest Natl Lab, Optimizat & Control Grp, Richland, WA USA

[2] Pacific Northwest Natl Lab, Data Analyt Grp, Richland, WA USA

[3] Pacific Northwest Natl Lab, Data Sci & Machine Intelligence Grp, Richland, WA USA

来源：

2022 IEEE INTERNATIONAL SYMPOSIUM ON TECHNOLOGIES FOR HOMELAND SECURITY (HST) | 2022年

关键词：

Search and Rescue; Multi-agent Reinforcement Learning; Adversarial Reinforcement Learning; Critical Infrastructure Security;

D O I：

10.1109/HST56032.2022.10025434

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

Search and Rescue (SAR) missions in remote environments often employ autonomous multi-robot systems that learn, plan, and execute a combination of local single-robot control actions, group primitives, and global mission-oriented coordination and collaboration. Often, SAR coordination strategies are manually designed by human experts who can remotely control the multi-robot system and enable semi-autonomous operations. However, in remote environments where connectivity is limited and human intervention is often not possible, decentralized collaboration strategies are needed for fully-autonomous operations. Nevertheless, decentralized coordination may be ineffective in adversarial environments due to sensor noise, actuation faults, or manipulation of inter-agent communication data. In this paper, we propose an algorithmic approach based on adversarial multi-agent reinforcement learning (MARL) that allows robots to efficiently coordinate their strategies in the presence of adversarial inter-agent communications. In our setup, the objective of the multi-robot team is to discover targets strategically in an obstacle-strewn geographical area by minimizing the average time needed to find the targets. It is assumed that the robots have no prior knowledge of the target locations, and they can interact with only a subset of neighboring robots at any time. Based on the centralized training with decentralized execution (CTDE) paradigm in MARL, we utilize a hierarchical meta-learning framework to learn dynamic team-coordination modalities and discover emergent team behavior under complex cooperative-competitive scenarios. The effectiveness of our approach is demonstrated on a collection of prototype grid-world environments with different specifications of benign and adversarial agents, target locations, and agent rewards.

引用

页数：7

共 12 条

[1] Learning Robot Swarm Tactics over Complex Adversarial Environments
Behjat, Amir
Manjunatha, Hemanth
Kumar, Prajit Krisshna
Jani, Apurv
Collins, Leighton
Ghassemi, Payam
Distefano, Joseph
Doermann, David
Dantu, Karthik
Esfahani, Ehsan
Chowdhury, Souma
[J]. 2021 INTERNATIONAL SYMPOSIUM ON MULTI-ROBOT AND MULTI-AGENT SYSTEMS (MRS), 2021, : 83 - 91
[2] Dongzi Wang, 2020, 2020 IEEE 3rd International Conference on Information Systems and Computer Aided Education (ICISCAE), P1, DOI 10.1109/ICISCAE51034.2020.9236869
[3] Iqbal S, 2021, Arxiv, DOI arXiv:1905.12127
[4] Lowe R, 2017, ADV NEUR IN, V30
[5] Lyu X, 2021, Arxiv, DOI arXiv:2102.04402
[6] Omidshafiei S, 2015, IEEE INT CONF ROBOT, P5962, DOI 10.1109/ICRA.2015.7140035
[7] Papoudakis G, 2019, Arxiv, DOI arXiv:1906.04737
[8] Pinto L, 2017, PR MACH LEARN RES, V70
[9] Collaborative Multi-Robot Search and Rescue: Planning, Coordination, Perception, and Active Vision
Queralta, Jorge Pena
Taipalmaa, Jussi
Pullinen, Bilge Can
Sarker, Victor Kathan
Tuan Nguyen Gia
Tenhunen, Hannu
Gabbouj, Moncef
Raitoharju, Jenni
Westerlund, Tomi
[J]. IEEE ACCESS, 2020, 8 : 191617 - 191643
[10] Silver D., 2015, Lecture 2: Markov decision processes

← 1 2 →