A comparison of Monte Carlo tree search and rolling horizon optimization for large-scale dynamic resource allocation problems

被引:36
作者
Bertsimas, Dimitris [1 ,2 ]
Griffith, J. Daniel [3 ]
Gupta, Vishal [4 ]
Kochenderfer, Mykel J. [5 ]
Misic, Velibor V. [6 ]
机构
[1] MIT, Sloan Sch Management, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[2] MIT, Ctr Operat Res, 77 Massachusetts Ave, Cambridge, MA 02139 USA
[3] MIT, Lincoln Lab, 244 Wood St, Lexington, MA 02420 USA
[4] Univ Southern Calif, Marshall Sch Business, Dept Data Sci & Operat, 3670 Trousdale Pkwy, Los Angeles, CA 90089 USA
[5] Stanford Univ, Dept Aeronaut & Astronaut, 496 Lomita Mali, Stanford, CA 94305 USA
[6] Univ Calif Los Angeles, Anderson Sch Management, 110 Westwood Plaza, Los Angeles, CA 90024 USA
基金
加拿大自然科学与工程研究理事会;
关键词
Dynamic resource allocation; Monte Carlo tree search; Rolling horizon optimization; Wildfire management; Queueing control; INITIAL ATTACK; ALGORITHM; MODEL;
D O I
10.1016/j.ejor.2017.05.032
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
Dynamic resource allocation (DRA) problems constitute an important class of dynamic stochastic optimization problems that arise in many real-world applications. DRA problems are notoriously difficult to solve since they combine stochastic dynamics with intractably large state and action spaces. Although the artificial intelligence and operations research communities have independently proposed two successful frameworks for solving such problems Monte Carlo tree search (MCTS) and rolling horizon optirhization (RHO), respectively the relative merits of these two approaches are not well understood. In this paper, we adapt MCTS and RHO to two problems - a problem inspired by tactical wildfire management and a classical problem involving the control of queueing networks - and undertake an extensive computational study comparing the two methods on large scale instances of both problems in terms of both the state and the action spaces. Both methods are able to greatly improve on a baseline, problem-specific heuristic. On smaller instances, the MCTS and RHO approaches perform comparably, but RHO outperforms MCTS as the size of the problem increases for a, fixed computational budget. (C) 2017 Elsevier B.V. All rights reserved.
引用
收藏
页码:664 / 678
页数:15
相关论文
共 39 条
[1]  
Acimovic J., 2012, TECHNICAL REPORT
[2]  
[Anonymous], GUR OPT REF MAN
[3]   Monte Carlo Tree Search in Hex [J].
Arneson, Broderick ;
Hayward, Ryan B. ;
Henderson, Philip .
IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2010, 2 (04) :251-258
[4]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[5]  
Avram F., 1995, Inst. Math. Appl, V71, P199
[6]   The air traffic flow management problem with enroute capacities [J].
Bertsimas, D ;
Patterson, SS .
OPERATIONS RESEARCH, 1998, 46 (03) :406-422
[7]   Robust Fluid Processing Networks [J].
Bertsimas, Dimitris ;
Nasrabadi, Ebrahim ;
Paschalidis, Ioannis Ch .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2015, 60 (03) :715-728
[8]   Dynamic resource allocation: A flexible and tractable modeling framework [J].
Bertsimas, Dimitris ;
Gupta, Shubham ;
Lulli, Guglielmo .
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2014, 236 (01) :14-26
[9]   OPTIMIZATION OF MULTICLASS QUEUEING NETWORKS: POLYHEDRAL AND NONLINEAR CHARACTERIZATIONS OF ACHIEVABLE PERFORMANCE [J].
Bertsimas, Dimitris ;
Paschalidis, Ioannis Ch. ;
Tsitsiklis, John N. .
ANNALS OF APPLIED PROBABILITY, 1994, 4 (01) :43-75
[10]  
Boychuck D., 2008, ENV ECOLOGICAL STAT, V1, P1