Tensor Implementation of Monte-Carlo Tree Search for Model-Based Reinforcement Learning

被引：1

作者：

Balaz, Marek ^{[1
]}

Tarabek, Peter ^{[1
]}

机构：

[1] Univ Zilina, Fac Management Sci & Informat, Univ 8215 1, Zilina 01026, Slovakia

来源：

APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 03期

关键词：

Monte-Carlo tree search; reinforcement learning; MuZero; parallel computations; tensor GPU implementation; model-based reinforcement learning; GO; NETWORKS; SHOGI; CHESS; GAME;

D O I：

10.3390/app13031406

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Monte-Carlo tree search (MCTS) is a widely used heuristic search algorithm. In model-based reinforcement learning, MCTS is often utilized to improve action selection process. However, model-based reinforcement learning methods need to process large number of observations during the training. If MCTS is involved, it is necessary to run one instance of MCTS for each observation in every iteration of training. Therefore, there is a need for efficient method to process multiple instances of MCTS. We propose a MCTS implementation that can process batch of observations in fully parallel fashion on a single GPU using tensor operations. We demonstrate efficiency of the proposed approach on a MuZero reinforcement learning algorithm. Empirical results have shown that our method outperforms other approaches and scale well with increasing number of observations and simulations.

引用

页数：20

共 47 条

[1]

Abadi Martin, 2016, arXiv

[2] Learning dexterous in-hand manipulation [J].

Andrychowicz, Marcin ;

Baker, Bowen ;

Chociej, Maciek ;

Jozefowicz, Rafal ;

McGrew, Bob ;

Pachocki, Jakub ;

Petron, Arthur ;

Plappert, Matthias ;

Powell, Glenn ;

Ray, Alex ;

Schneider, Jonas ;

Sidor, Szymon ;

Tobin, Josh ;

Welinder, Peter ;

Weng, Lilian ;

Zaremba, Wojciech .

INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (01) :3-20

[3] Drone Deep Reinforcement Learning: A Review [J].

Azar, Ahmad Taher ;

Koubaa, Anis ;

Ali Mohamed, Nada ;

Ibrahim, Habiba A. ;

Ibrahim, Zahra Fathy ;

Kazim, Muhammad ;

Ammar, Adel ;

Benjdira, Bilel ;

Khamis, Alaa M. ;

Hameed, Ibrahim A. ;

Casalino, Gabriella .

ELECTRONICS, 2021, 10 (09)

[4]

Badia AP, 2020, PR MACH LEARN RES, V119

[5]

Balaz Marek, 2021, 2021 International Conference on Information and Digital Technologies (IDT), P194, DOI 10.1109/IDT52577.2021.9497522

[6]

Barriga NA, 2014, 2014 IEEE C COMP INT, P1

[7] A Survey of Monte Carlo Tree Search Methods [J].

Browne, Cameron B. ;

Powley, Edward ;

Whitehouse, Daniel ;

Lucas, Simon M. ;

Cowling, Peter I. ;

Rohlfshagen, Philipp ;

Tavener, Stephen ;

Perez, Diego ;

Samothrakis, Spyridon ;

Colton, Simon .

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2012, 4 (01) :1-43

[8] Parallel Monte-Carlo Tree Search [J].

Chaslot, Guillaume M. J. -B. ;

Winands, Mark H. M. ;

van den Herik, H. Jaap .

COMPUTERS AND GAMES, 2008, 5131 :60-+

[9] A Survey of Planning and Learning in Games [J].

Duarte, Fernando Fradique ;

Lau, Nuno ;

Pereira, Artur ;

Reis, Luis Paulo .

APPLIED SCIENCES-BASEL, 2020, 10 (13)

[10] Towards Autonomous Defense of SDN Networks Using MuZero Based Intelligent Agents [J].

Gabirondo-Lopez, Jon ;

Egana, Jon ;

Miguel-Alonso, Jose ;

Orduna Urrutia, Raul .

IEEE ACCESS, 2021, 9 :107184-107199

← 1 2 3 4 5 →