Double Q-learning Agent for Othello Board Game

被引：0

作者：

Somasundaram, Thamarai Selvi ^{[1
]}

Panneerselvam, Karthikeyan ^{[1
]}

Bhuthapuri, Tarun

Mahadevan, Harini

Jose, Ashik

机构：

[1] Madras Inst Technol, Dept Comp Technol, Chennai, Tamil Nadu, India

来源：

2018 10TH INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING (ICOAC) | 2018年

关键词：

Double Q-learning; Othello; reinforcement learning; GO;

D O I：

暂无

中图分类号：

TP39 [计算机的应用];

学科分类号：

081203 ; 0835 ;

摘要：

This paper presents the first application of the Double Q-learning algorithm to the game of Othello. Reinforcement learning has previously been successfully applied to Othello using the canonical reinforcement learning algorithms, Q-learning and TD-learning. However, the algorithms suffer from considerable drawbacks. Q-learning frequently tends to be overoptimistic during evaluation, while TD-learning can get stuck in local minima. To overcome the disadvantages of the existing work, we propose using a Double Q-learning agent to play Othello and prove that it performs better than the existing learning agents. In addition to developing and implementing the Double Q-learning agent, we implement the Q-learning and TD-learning agents. The agents are trained and tested against two fixed opponents: a random player and a heuristic player. The performance of the Double Q-learning agent is compared with performance of the existing learning agents. The Double Q-learning agent outperforms them, although it takes longer, on average, to make each move. Further, we show that the Double Q-learning agent performs at its best with two hidden layers using the tanh function.

引用

页码：216 / 223

页数：8

共 17 条

[1] [Anonymous], P 2006 IEEE S COMP I
[2] [Anonymous], 2013, Playing atari with deep reinforcement learning
[3] [Anonymous], CONNECTION SCI
[4] Hasselt H., 2010, 23 ADV NEUR INF PROC, V23, P2613
[5] Lucas S. M., 2008, Australian J. Intell. Inf. Process., V4, P1
[6] DeepStack: Expert-level artificial intelligence in heads-up no-limit poker
Moravcik, Matej
Schmid, Martin
Burch, Neil
Lisy, Viliam
Morrill, Dustin
Bard, Nolan
Davis, Trevor
Waugh, Kevin
Johanson, Michael
Bowling, Michael
[J]. SCIENCE, 2017, 356 (6337) : 508 - +
[7] SAMUEL AL, 1959, IBM J RES DEV, V3, P211, DOI 10.1147/rd.441.0206
[8] Shannon C E., 1950, London, V41, P256, DOI [10.1080/14786445008521796, DOI 10.1080/14786445008521796]
[9] Mastering the game of Go without human knowledge
Silver, David
Schrittwieser, Julian
Simonyan, Karen
Antonoglou, Ioannis
Huang, Aja
Guez, Arthur
Hubert, Thomas
Baker, Lucas
Lai, Matthew
Bolton, Adrian
Chen, Yutian
Lillicrap, Timothy
Hui, Fan
Sifre, Laurent
van den Driessche, George
Graepel, Thore
Hassabis, Demis
[J]. NATURE, 2017, 550 (7676) : 354 - +
[10] Mastering the game of Go with deep neural networks and tree search
Silver, David
Huang, Aja
Maddison, Chris J.
Guez, Arthur
Sifre, Laurent
van den Driessche, George
Schrittwieser, Julian
Antonoglou, Ioannis
Panneershelvam, Veda
Lanctot, Marc
Dieleman, Sander
Grewe, Dominik
Nham, John
Kalchbrenner, Nal
Sutskever, Ilya
Lillicrap, Timothy
Leach, Madeleine
Kavukcuoglu, Koray
Graepel, Thore
Hassabis, Demis
[J]. NATURE, 2016, 529 (7587) : 484 - +

← 1 2 →