Large-Scale Optimization for Evaluation Functions with Minimax Search

被引:29
作者
Hoki, Kunihito [1 ]
Kaneko, Tomoyuki [2 ]
机构
[1] Univ Electrocommun, Dept Commun Engn & Informat, Chofu, Tokyo 182, Japan
[2] Univ Tokyo, Dept Graph & Comp Sci, Tokyo 1138654, Japan
关键词
CARLO TREE-SEARCH; GAME; CHESS;
D O I
10.1613/jair.4217
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents a new method, Minimax Tree Optimization (MMTO), to learn a heuristic evaluation function of a practical alpha-beta search program. The evaluation function may be a linear or non-linear combination of weighted features, and the weights are the parameters to be optimized. To control the search results so that the move decisions agree with the game records of human experts, a well-modeled objective function to be minimized is designed. Moreover, a numerical iterative method is used to find local minima of the objective function, and more than forty million parameters are adjusted by using a small number of hyper parameters. This method was applied to shogi, a major variant of chess in which the evaluation function must handle a larger state space than in chess. Experimental results show that the large-scale optimization of the evaluation function improves the playing strength of shogi programs, and the new method performs significantly better than other methods. Implementation of the new method in our shogi program B o n a n z a made substantial contributions to the program's first-place finish in the 2013 World Computer Shogi Championship. Additionally, we present preliminary evidence of broader applicability of our method to other two-player games such as chess.
引用
收藏
页码:527 / 568
页数:42
相关论文
共 62 条
[1]   SOME METHODS OF CONTROLLING TREE SEARCH IN CHESS PROGRAMS [J].
ADELSONVELSKIY, GM ;
ARLAZAROV, VL ;
DONSKOY, MV .
ARTIFICIAL INTELLIGENCE, 1975, 6 (04) :361-371
[2]  
AKL SG, 1977, 1977 ACM ANN C P, P466
[3]   Evaluation tuning for computer chess: Linear discriminant methods [J].
Anantharaman, TS .
ICCA JOURNAL, 1997, 20 (04) :224-242
[4]  
[Anonymous], THESIS U MASSACHUSET
[5]  
[Anonymous], ICCA J
[6]  
[Anonymous], 1983, ICGA J
[7]  
[Anonymous], 1980, AAAI
[8]   Learning to play chess using temporal differences [J].
Baxter, J ;
Tridgell, A ;
Weaver, L .
MACHINE LEARNING, 2000, 40 (03) :243-263
[9]   A GENERALIZED QUIESCENCE SEARCH ALGORITHM [J].
BEAL, DF .
ARTIFICIAL INTELLIGENCE, 1990, 43 (01) :85-98
[10]   Temporal difference learning applied to game playing and the results of application to shogi [J].
Beal, DF ;
Smith, MC .
THEORETICAL COMPUTER SCIENCE, 2001, 252 (1-2) :105-119