Enhanced Reinforcement Learning Method Combining One-Hot Encoding-Based Vectors for CNN-Based Alternative High-Level Decisions

被引：22

作者：

Gu, Bonwoo ^{[1
]}

Sung, Yunsick ^{[2
]}

机构：

[1] SIMNET Cooperat, Dept M&S, Daejeon 34127, South Korea

[2] Dongguk Univ Seoul, Dept Multimedia Engn, Seoul 04620, South Korea

来源：

APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 03期

基金：

新加坡国家研究基金会;

关键词：

gomoku; game artificial intelligence; convolutional neural-networks; one-hot encoding; reinforcement learning;

D O I：

10.3390/app11031291

中图分类号：

O6 [化学];

学科分类号：

0703 ;

摘要：

Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go's algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.

引用

页码：1 / 15

页数：15

共 34 条

[1]

Allis L.V., 1993, GO MOKU THREAT SPACE

[2]

[Anonymous], 2007, ARXIVCS0703062

[3]

[Anonymous], 2018, ARXIV180602308

[4]

[Anonymous], INTRO CONVOLUTIONAL

[5]

[Anonymous], 2013, MACHINE LEARNING R

[6]

Bradtke SJ, 1996, MACH LEARN, V22, P33, DOI 10.1007/BF00114723

[7] A Survey of Monte Carlo Tree Search Methods [J].

Browne, Cameron B. ;

Powley, Edward ;

Whitehouse, Daniel ;

Lucas, Simon M. ;

Cowling, Peter I. ;

Rohlfshagen, Philipp ;

Tavener, Stephen ;

Perez, Diego ;

Samothrakis, Spyridon ;

Colton, Simon .

IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2012, 4 (01) :1-43

[8] Similarity encoding for learning with dirty categorical variables [J].

Cerda, Patricio ;

Varoquaux, Gael ;

Kegl, Balazs .

MACHINE LEARNING, 2018, 107 (8-10) :1477-1494

[9] On the bottleneck tree alignment problems [J].

Chen, Yen Hung ;

Tang, Chuan Yi .

INFORMATION SCIENCES, 2010, 180 (11) :2134-2141

[10]

Colledanchise M., 2018, Behavior Trees in Robotics and AI, DOI 10.1201/

← 1 2 3 4 →