Learning to play Go using recursive neural networks

被引:19
作者
Wu, Lin [1 ]
Baldi, Pierre [1 ]
机构
[1] Univ Calif Irvine, Inst Genom & Bioinformat, Sch Informat & Comp Sci, Irvine, CA 92697 USA
关键词
Computer games; Computer Go; Machine learning; Evaluation function; Recursive neural network; Knowledge transfer;
D O I
10.1016/j.neunet.2008.02.002
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Go is an ancient board game that poses unique opportunities and challenges for artificial intelligence. Currently, there are no computer Go programs that can play at the level of a good human player. However, the emergence of large repositories of games is opening the door for new machine learning approaches to address this challenge. Here we develop a machine learning approach to Go. and related board games. focusing primarily on the problem of learning a good evaluation function in a scalable way. Scalability is essential at multiple levels, from the library of local tactical patterns. to the integration of patterns across the board, to the size of the board itself. The system we propose is capable of automatically learning the propensity of local patterns from a library of games. Propensity and other local tactical information are fed into recursive neural networks, derived from a probabilistic Bayesian network architecture. The recursive neural networks in turn integrate local information across the board in all four cardinal directions and produce local Outputs that represent local territory ownership probabilities. The aggregation of these probabilities provides an effective strategic evaluation function that is an estimate of the expected area at the end. or at various other stages, of the game. Local area targets tor training can be derived from datasets of games played by human players. In this approach, while requiring a learning time proportional to N-4. skills learned on a board of size N-2 can easily he transferred to boards of other sizes. A system trained using 9 X 9 amateur game data performs Surprisingly well on a test set derived from 19 x 19 professional game data. Possible directions for further improvements are briefly discussed. (C) 2008 Elsevier Ltd. All rights reserved.
引用
收藏
页码:1392 / 1400
页数:9
相关论文
共 35 条
[1]   EXPECTED-OUTCOME - A GENERAL-MODEL OF STATIC EVALUATION [J].
ABRAMSON, B .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1990, 12 (02) :182-193
[2]  
[Anonymous], ECML
[3]  
[Anonymous], 1998, Reinforcement Learning
[4]  
[Anonymous], 2006, P COMP GAM TUR IT MA
[5]  
[Anonymous], 2004, J MACH LEARN RES, DOI DOI 10.1162/153244304773936054
[6]  
[Anonymous], 1996, INTEGRATION PRIORI K
[7]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[8]   Exploiting the past and the future in protein secondary structure prediction [J].
Baldi, P ;
Brunak, S ;
Frasconi, P ;
Soda, G ;
Pollastri, G .
BIOINFORMATICS, 1999, 15 (11) :937-946
[9]   A GENERALIZED QUIESCENCE SEARCH ALGORITHM [J].
BEAL, DF .
ARTIFICIAL INTELLIGENCE, 1990, 43 (01) :85-98
[10]  
BENSON DB, 1976, INFORM SCIENCES, V10, P17, DOI 10.1016/0020-0255(76)90059-1