A deep reinforcement learning approach for multi-agent mobile robot patrolling

被引：12

作者：

Jana, Meghdeep ^{[1
]}

Vachhani, Leena ^{[1
]}

Sinha, Arpita ^{[1
]}

机构：

[1] Indian Inst Technol, Autonomous Robots & Multiagent Syst Lab, Syst & Control Engn, Mumbai, Maharashtra, India

来源：

INTERNATIONAL JOURNAL OF INTELLIGENT ROBOTICS AND APPLICATIONS | 2022年 / 6卷 / 04期

关键词：

Multi-agent; patrolling; Markov decision process; Reinforcement learning; Deep learning;

D O I：

10.1007/s41315-022-00235-1

中图分类号：

TP24 [机器人技术];

学科分类号：

080202 ; 1405 ;

摘要：

Patrolling strategies primarily deal with minimising the time taken to visit specific locations and cover an area. The use of intelligent agents in patrolling has become beneficial in automation and analysing patterns in patrolling. However, practical scenarios demand these strategies to be adaptive in various conditions and robust against adversaries. Traditional Q-learning based patrolling keeps track of all possible states and actions in a Q-table, making them susceptible to the curse of dimensionality. For multi-agent patrolling to be adaptive in various scenarios represented using graphs, we propose a formulation of the Markov Decision Process (MDP) with state-representations that can be utilised for Deep Reinforcement Learning (DRL) approaches such as Deep Q-Networks (DQN). The implemented DQN can estimate the MDP using a finite length state vector trained with a novel reward function. Proposed state-space representation is independent of the number of nodes in the graph, thereby addressing scalability to graph dimensions. We also propose a reward function to penalise the agents for lack of global coordination while providing immediate local feedback on their actions. As independent policy learners subject to the MDP and reward function, the DRL agents formed a collaborative patrolling strategy. The policies learned by the agents generalise and adapt to multiple behaviours without explicit training or design to do so. We provide empirical analysis that shows the strategy's adaptive capabilities with changes in agents' position, non-uniform node visit frequency requirements, changes in a graph structure representing the environment, and induced randomness in the trajectories. DRL patrolling proves to be a promising patrolling strategy for intelligent agents by potentially being scalable, adaptive, and robust against adversaries.

引用

页码：724 / 745

页数：22

共 28 条

[1]

Agmon N, 2011, J ARTIF INTELL RES, V42, P887

[2]

Almeida A, 2004, LECT NOTES ARTIF INT, V3171, P474

[3] Multi-Robot Uniform Frequency Coverage of Significant Locations in the Environment [J].

Baglietto, Marco ;

Cannata, Giorgio ;

Capezio, Francesco ;

Sgorbissa, Antonio .

DISTRIBUTED AUTONOMOUS ROBOTIC SYSTEMS 8, 2009, :3-14

[4] Multi-Agent Patrolling under Uncertainty and Threats [J].

Chen, Shaofei ;

Wu, Feng ;

Shen, Lincheng ;

Chen, Jing ;

Ramchurn, Sarvapali D. .

PLOS ONE, 2015, 10 (06)

[5]

Chevaleyre Y., 2004, P 3 INT JOINT C AUT

[6] Multi-robot area patrol under frequency constraints [J].

Elmaliach, Yehuda ;

Agmon, Noa ;

Kaminka, Gal A. .

ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2009, 57 (3-4) :293-320

[7]

Elor Y., 2009, P 2009 IEEE WIC ACM, V2, P5257, DOI [10.1109/ WI- IAT. 2009.125, DOI 10.1109/WI-IAT.2009.125, 10.1109/WI-IAT.2009.125]

[8]

Hu Z., 2010, P 9 IEEE INT C COGN

[9]

Krajzewicz D., 2002, SUMO SIMULATION URBA

[10] Robust Multi-agent Patrolling Strategies Using Reinforcement Learning [J].

Lauri, Fabrice ;

Koukam, Abderrafiaa .

SWARM INTELLIGENCE BASED OPTIMIZATION (ICSIBO 2014), 2014, 8472 :157-165

← 1 2 3 →