A Monte Carlo Tree Search approach to finding efficient patrolling schemes on graphs

被引:16
作者
Karwowski, Jan [1 ]
Mandziuk, Jacek [1 ]
机构
[1] Warsaw Univ Technol, Fac Math & Informat Sci, Koszykowa 75, PL-00662 Warsaw, Poland
关键词
Game theory; Stackelberg games; MOTS; Security games; SECURITY GAMES; TIME; UCT; INTERDICTION; INFORMATION; EQUILIBRIA; ALGORITHM;
D O I
10.1016/j.ejor.2019.02.017
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
In this paper, we propose an evader-defender type of game for modeling multi-step patrolling scenarios on a graph. The game utilizes a specifically designed graph-based setting which captures spatial arrangements of the protected area, for instance industrial premises or warehouses, wherein certain valuable assets are stored. The game is played by two sides: the evader who attempts to steal or destroy the assets and the defender whose aim is to intercept the evader and prevent him/her from accomplishing his/her goal. Real-life specificity of the proposed game assumes information asymmetry between the two sides as the evader can usually observe defender's patrolling schedules prior to making decision of an attack. For this reason, we employ the Stackelberg Game principles to model our game and consequently focus on approximation of Stackelberg Equilibrium during the solution process. To this end we propose a novel approach, called Mixed-UCT, which relies on Upper Confidence Bound applied to Trees algorithm - a variant of Monte Carlo Tree Search. The efficacy of the proposed solution method is experimentally evaluated on randomly generated games played in warehouse-like, industrial environment. The results show that Mixed-UCT is efficient and scales very well for multi-step games with reasonable number of steps, leading to optimal or close-to-optimal strategies. (C) 2019 Elsevier B.V. All rights reserved.
引用
收藏
页码:255 / 268
页数:14
相关论文
共 73 条
[1]   A Deployed Quantal Response-Based Patrol Planning System for the U.S. Coast Guard [J].
An, Bo ;
Ordonez, Fernando ;
Tambe, Milind ;
Shieh, Eric ;
Yang, Rong ;
Baldwin, Craig ;
DiRenzo, Joseph, III ;
Moretti, Kathryn ;
Maule, Ben ;
Meyer, Garrett .
INTERFACES, 2013, 43 (05) :400-420
[2]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[3]  
Basar T., 1998, Dynamic Noncooperative Game Theory, DOI DOI 10.1137/1.9781611971132
[4]   Patrolling security games: Definition and algorithms for solving large instances with single patroller and single intruder [J].
Basilico, Nicola ;
Gatti, Nicola ;
Amigoni, Francesco .
ARTIFICIAL INTELLIGENCE, 2012, 184 :78-123
[5]   Sequential Interdiction with Incomplete Information and Learning [J].
Borrero, Juan S. ;
Prokopyev, Oleg A. ;
Saure, Denis .
OPERATIONS RESEARCH, 2019, 67 (01) :72-89
[6]   Sequential Shortest Path Interdiction with Incomplete Information [J].
Borrero, Juan S. ;
Prokopyev, Oleg A. ;
Saure, Denis .
DECISION ANALYSIS, 2016, 13 (01) :68-98
[7]  
Bosansky B, 2015, AAAI CONF ARTIF INTE, P805
[8]   An Exact Double-Oracle Algorithm for Zero-Sum Extensive-Form Games with Imperfect Information [J].
Bosansky, Branislav ;
Kiekintveld, Christopher ;
Lisy, Viliam ;
Pechoucek, Michal .
JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2014, 51 :829-866
[9]   Defending critical infrastructure [J].
Brown, Gerald ;
Carlyle, Matthew ;
Salmeron, Javier ;
Wood, Kevin .
INTERFACES, 2006, 36 (06) :530-544
[10]   Interdicting a Nuclear-Weapons Project [J].
Brown, Gerald G. ;
Carlyle, W. Matthew ;
Harney, Robert C. ;
Skroch, Eric M. ;
Wood, R. Kevin .
OPERATIONS RESEARCH, 2009, 57 (04) :866-877