Research on a back-propagation neural network based Q learning algorithm in multi agent system

被引:0
作者
Lin, OY [1 ]
Guo, QP [1 ]
Santai, OY [1 ]
机构
[1] Wuhan Univ Technol, Sch Comp Sci & Technol, Wuhan 430063, Hubei, Peoples R China
来源
DCABES 2004, Proceedings, Vols, 1 and 2 | 2004年
关键词
multi agent; neural network; Q learning; reinforcement learning;
D O I
暂无
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Following the development of the artificial intelligence, the research of reinforcement learning of multi agent and the neural network become more and more prevail. The Q learning algorithm, as a kind of reinforcement learning, is a kind of online learning method. Following increasing of the scale of the problem, the exploration space becomes too enormous to deal with by the traditional Q learning algorithm. The neural network, as a kind of self-organization, self-adaptive and supervised method on learning, can hide the inner continuous connection between the input and the output of problem. ne combination of the neural network with Q learning algorithm, which called back-propagation neural network based Q learning algorithm (BPNNQ), can reduces the exploration space remarkably, by take advantage of the neural network and the Q learning reinforcement learning methods. How to avoid falling into local optimal,solution is another difficult problem in machine learning. Through the using of the Boltzmann distribution strategy in the BPNNQ algorithm, the locale optimal solution is solved to a certain extent.
引用
收藏
页码:784 / 789
页数:6
相关论文
共 13 条
[1]  
[Anonymous], MULTIAGENT SYSTEMS T
[2]  
GAO Y, 2000, J COMPUTER RES DEV, V37, P257
[3]  
Hong Bing-Rong, 2003, Journal of the Harbin Institute of Technology, V35, P1053
[4]  
HORIKAWA S, 1992, IEEE T NEURAL NETWOR
[5]  
LAELBLING L, 1996, J ARTIFICIAL INTELLI, V4, P237
[6]  
Luo Qing, 2002, Journal of System Simulation, V14, P1094
[7]  
Mitchell TM., 1997, MACH LEARN, V1
[8]  
SEEL N, 1989, THESIS SURREY U GUIL
[9]  
SUTTON RS, 1991, ANIMALS ANIMATS, V1, P105
[10]  
[王醒策 Wang Xiangce], 2002, [计算机工程, Computer Engineering], V28, P15