Modeling and Algorithms of Multi-agent Reinforcement Learning Using Stochastic Game

被引:0
作者
Xie Guangqiang [1 ,2 ]
Chen Xuesong [1 ,3 ]
机构
[1] Guangdong Univ Technol, Fac Automat, Guangzhou 510006, Guangdong, Peoples R China
[2] Guangdong Univ Technol, Fac Comp, Guangzhou 510006, Guangdong, Peoples R China
[3] Guangdong Univ Technol, Fac Appl Math, Guangzhou 510006, Guangdong, Peoples R China
来源
2010 THE 3RD INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND INDUSTRIAL APPLICATION (PACIIA2010), VOL VII | 2010年
关键词
reinforcement learning; multi-agent systems; convergence; stochastic games; SYSTEMS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning has been successful at finding optimal control policies through trial-and-error interaction with dynamic environment. Its properties of self-improving and online learning make reinforcement learning become one of most important machine learning methods. First of all, we survey the foundation, mathematical models of environment of reinforcement learning; The convergence and reward of the algorithms are discussed in the next; Then two traditional algorithms are deeply discussed in detail, including Q-Learning(QL) and Minimax Q-Learning(MQL), and a new algorithm of Opponent Modeling Q-Learnig(OMQL)is formed based on MQL. Finally, the simulation results demonstrate that the model could converge to optimal strategy steadily, moreover, the model could improve the learning performance and speed up the convergence of the learning process.
引用
收藏
页码:375 / 378
页数:4
相关论文
共 9 条
[1]  
[Anonymous], 2000, Multiagent Systems: AModern Approach to DistributedArtificial Intelligence
[2]  
Arai S., 2000, Proceedings of the Fourth International Conference on Autonomous Agents, P104, DOI 10.1145/336595.337062
[3]  
Bowling M., 2003, MULTIAGENT LEARNING
[4]   A comprehensive survey of multiagent reinforcement learning [J].
Busoniu, Lucian ;
Babuska, Robert ;
De Schutter, Bart .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172
[5]   A reinforcement learning with evolutionary state recruitment strategy for autonomous mobile robots control [J].
Kondo, T ;
Ito, K .
ROBOTICS AND AUTONOMOUS SYSTEMS, 2004, 46 (02) :111-124
[6]   Flocking for multi-agent dynamic systems: Algorithms and theory [J].
Olfati-Saber, R .
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2006, 51 (03) :401-420
[7]   Convergence results for single-step on-policy reinforcement-learning algorithms [J].
Singh, S ;
Jaakkola, T ;
Littman, ML ;
Szepesvári, C .
MACHINE LEARNING, 2000, 38 (03) :287-308
[8]   Multiagent systems: A survey from a machine learning perspective [J].
Stone, P ;
Veloso, M .
AUTONOMOUS ROBOTS, 2000, 8 (03) :345-383
[9]  
Sutton R., 1999, REINFORCEMENT LEARNI