Beyond games: a systematic review of neural Monte Carlo tree search applications

被引:4
作者
Kemmerling, Marco [1 ]
Luetticke, Daniel [1 ]
Schmitt, Robert H. [1 ]
机构
[1] Rhein Westfal TH Aachen, Informat Management Mech Engn WZL MQ IMA, Aachen, Germany
关键词
Monte carlo tree search; MCTS; Neural monte carlo tree search; Reinforcement learning; Model-based reinforcement learning; Decision-time planning; REINFORCEMENT; NETWORKS; GO; DESIGN; SCHEME; SHOGI; CHESS;
D O I
10.1007/s10489-023-05240-w
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The advent of AlphaGo and its successors marked the beginning of a new paradigm in playing games using artificial intelligence. This was achieved by combining Monte Carlo tree search, a planning procedure, and deep learning. While the impact on the domain of games has been undeniable, it is less clear how useful similar approaches are in applications beyond games and how they need to be adapted from the original methodology. We perform a systematic literature review of peer-reviewed articles detailing the application of neural Monte Carlo tree search methods in domains other than games. Our goal is to systematically assess how such methods are structured in practice and if their success can be extended to other domains. We find applications in a variety of domains, many distinct ways of guiding the tree search using learned policy and value functions, and various training methods. Our review maps the current landscape of algorithms in the family of neural monte carlo tree search as they are applied to practical problems, which is a first step towards a more principled way of designing such algorithms for specific problems and their requirements.
引用
收藏
页码:1020 / 1046
页数:27
相关论文
共 161 条
[1]   Reinforcement Learning-Based Distributed BESS Management for Mitigating Overvoltage Issues in Systems With High PV Penetration [J].
Al-Saffar, Mohammed ;
Musilek, Petr .
IEEE TRANSACTIONS ON SMART GRID, 2020, 11 (04) :2980-2994
[2]  
Anthony T, 2017, ADV NEUR IN, V30
[3]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[4]   Hierarchical policy with deep-reinforcement learning for nonprehensile multiobject rearrangement [J].
Bai, Fan ;
Meng, Fei ;
Liu, Jianbang ;
Wang, Jiankun ;
Meng, Max Q. -H. .
BIOMIMETIC INTELLIGENCE AND ROBOTICS, 2022, 2 (03)
[5]  
Bitter C, 2022, ARXIV
[6]  
Brockman Greg, 2016, arXiv
[7]   A Survey of Monte Carlo Tree Search Methods [J].
Browne, Cameron B. ;
Powley, Edward ;
Whitehouse, Daniel ;
Lucas, Simon M. ;
Cowling, Peter I. ;
Rohlfshagen, Philipp ;
Tavener, Stephen ;
Perez, Diego ;
Samothrakis, Spyridon ;
Colton, Simon .
IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2012, 4 (01) :1-43
[8]   Dialogue management in conversational agents through psychology of persuasion and machine learning [J].
Carfora, Valentina ;
Di Massimo, Francesca ;
Rastelli, Rebecca ;
Catellani, Patrizia ;
Piastra, Marco .
MULTIMEDIA TOOLS AND APPLICATIONS, 2020, 79 (47-48) :35949-35971
[9]  
CHALLITA U, 2021, ICC 2021 IEEE INT C, P1
[10]   iPAS: A deep Monte Carlo Tree Search-based intelligent pilot-power allocation scheme for massive MIMO system [J].
Chen, Jienan ;
Luo, Siyu ;
Zhang, Lin ;
Zhang, Cong ;
Cao, Bin .
DIGITAL COMMUNICATIONS AND NETWORKS, 2021, 7 (03) :362-372