An Improved Algorithm Model based on Machine Learning

被引:0
作者
Zhou Ke [1 ]
Wong Huan [1 ]
Wu Ruo-fan [1 ]
Qi Xin [1 ]
机构
[1] Univ Sci & Technol Beijing, Sch Adv Engn, Beijing 100083, Peoples R China
来源
2015 27TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC) | 2015年
关键词
Machine Learning; Q-learning; Evaluation Function; Connect6; Computer Game;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In the last decades, Reinforcement Learning (RL) algorithm has attracted more and more attention, and become the research focus in the field of machine learning. This paper leads the typical RL algorithm, Q-learning algorithm, into computer game platform (Connect6), and proposes an improved method. We adjust reward parameter according to the shape of Connect6, and optimize the adjustment of evaluation function to achieve the global optimization. Moreover, the optimization of the reward makes the valueless units away from the evaluation, to reduce the interference of valueless units for optimal results and improve the convergence speed, thereby reducing the overall time of self-learning process.
引用
收藏
页码:3754 / 3757
页数:4
相关论文
共 19 条
[1]  
Autonès M, 2004, LECT NOTES COMPUT SC, V3003, P1
[2]  
Bishop Christopher, 2006, Pattern Recognition and Machine Learning, DOI 10.1117/1.2819119
[3]  
Buro M., 1998, COMPUTERS AND GAMES, V1558, P126, DOI DOI 10.1007/3-540-48957-6_8
[4]  
Ernst D, 2005, J MACH LEARN RES, V6, P503
[5]  
Even-Dar E, 2003, J MACH LEARN RES, V5, P1
[6]  
Gammon TD, 1994, NEURAL COMPUT, V6, P215
[7]  
Geist M, 2014, J MACH LEARN RES, V15, P289
[8]   Tuning evaluation functions by maximizing concordance [J].
Gomboc, D ;
Buro, A ;
Marsland, TA .
THEORETICAL COMPUTER SCIENCE, 2005, 349 (02) :202-229
[9]  
Kaneko T, 2004, INT FED INFO PROC, V135, P279
[10]  
Mitchell T M, 2003, Machine Learning