Enhanced Reinforcement Learning Method Combining One-Hot Encoding-Based Vectors for CNN-Based Alternative High-Level Decisions

被引:19
作者
Gu, Bonwoo [1 ]
Sung, Yunsick [2 ]
机构
[1] SIMNET Cooperat, Dept M&S, Daejeon 34127, South Korea
[2] Dongguk Univ Seoul, Dept Multimedia Engn, Seoul 04620, South Korea
来源
APPLIED SCIENCES-BASEL | 2021年 / 11卷 / 03期
基金
新加坡国家研究基金会;
关键词
gomoku; game artificial intelligence; convolutional neural-networks; one-hot encoding; reinforcement learning;
D O I
10.3390/app11031291
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
Gomoku is a two-player board game that originated in ancient China. There are various cases of developing Gomoku using artificial intelligence, such as a genetic algorithm and a tree search algorithm. Alpha-Gomoku, Gomoku AI built with Alpha-Go's algorithm, defines all possible situations in the Gomoku board using Monte-Carlo tree search (MCTS), and minimizes the probability of learning other correct answers in the duplicated Gomoku board situation. However, in the tree search algorithm, the accuracy drops, because the classification criteria are manually set. In this paper, we propose an improved reinforcement learning-based high-level decision approach using convolutional neural networks (CNN). The proposed algorithm expresses each state as One-Hot Encoding based vectors and determines the state of the Gomoku board by combining the similar state of One-Hot Encoding based vectors. Thus, in a case where a stone that is determined by CNN has already been placed or cannot be placed, we suggest a method for selecting an alternative. We verify the proposed method of Gomoku AI in GuPyEngine, a Python-based 3D simulation platform.
引用
收藏
页码:1 / 15
页数:15
相关论文
共 34 条
  • [31] WATKINS CJCH, 1992, MACH LEARN, V8, P279, DOI 10.1007/BF00992698
  • [32] Two-Stage Monte Carlo Tree Search for Connect6
    Yen, Shi-Jim
    Yang, Jung-Kuei
    [J]. IEEE TRANSACTIONS ON COMPUTATIONAL INTELLIGENCE AND AI IN GAMES, 2011, 3 (02) : 100 - 118
  • [33] Self-teaching adaptive dynamic programming for Gomoku
    Zhao, Dongbin
    Zhang, Zhen
    Dai, Yujie
    [J]. NEUROCOMPUTING, 2012, 78 (01) : 23 - 29
  • [34] Zheng P.M., 2016, COMPUT KNOWL TECHNOL, V2016, P33