Reinforcement learning of competitive and cooperative skills in soccer agents

被引：4

作者：

Leng, Jinsong ^{[1
]}

Lim, Chee Peng ^{[2
]}

机构：

[1] Edith Cowan Univ, Sch Comp & Secur Sci, Mt Lawley, WA 6050, Australia

[2] Univ Sci Malaysia, Sch Elect & Elect Engn, Nibong Tebal 14300, Penang, Malaysia

来源：

APPLIED SOFT COMPUTING | 2011年 / 11卷 / 01期

关键词：

Reinforcement learning; Temporal difference learning; On-policy and off-policy; Eligibility traces; Performance; Convergence; APPROXIMATION; CONVERGENCE; TD(LAMBDA);

D O I：

10.1016/j.asoc.2010.04.007

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The main aim of this paper is to provide a comprehensive numerical analysis on the efficiency of various reinforcement learning (RL) techniques in an agent-based soccer game. The SoccerBots is employed as a simulation testbed to analyze the effectiveness of RL techniques under various scenarios. A hybrid agent teaming framework for investigating agent team architecture, learning abilities, and other specific behaviours is presented. Novel RL algorithms to verify the competitive and cooperative learning abilities of goal-oriented agents for decision-making are developed. In particular, the tile coding (TC) technique, a function approximation approach, is used to prevent the state space from growing exponentially, hence avoiding the curse of dimensionality. The underlying mechanism of eligibility traces is evaluated in terms of on-policy and off-policy procedures, as well as accumulating traces and replacing traces. The results obtained are analyzed, and implications of the results towards agent teaming and learning are discussed. (C) 2010 Elsevier B.V. All rights reserved.

引用

页码：1353 / 1362

页数：10

共 32 条

[1] BELLMAN R, 1957, J MATH MECH, P6
[2] Technical update: Least-squares temporal difference learning
Boyan, JA
[J]. MACHINE LEARNING, 2002, 49 (2-3) : 233 - 246
[3] THE CONVERGENCE OF TD(LAMBDA) FOR GENERAL LAMBDA
DAYAN, P
[J]. MACHINE LEARNING, 1992, 8 (3-4) : 341 - 362
[4] DAYAN P, 1994, MACH LEARN, V14, P295
[5] Gabel T., 2006, KI Z, V20, P18
[6] Howard R. A., 1960, Dynamic programming and Markov processes
[7] JOHN NT, 1994, MACH LEARN, V16, P185
[8] Kleiner A., 2002, P INT ROB S 02 FUK J, P119
[9] LENG J, 2006, LECT NOTES ARTIF INT, V4692, P572
[10] Leng JS, 2006, LECT NOTES ARTIF INT, V4252, P472

← 1 2 3 4 →