Reinforcement learning soccer teams with incomplete world models

被引：19

作者：

Wiering, M ^{[1
]}

Salustowicz, R ^{[1
]}

Schmidhuber, J ^{[1
]}

机构：

[1] IDSIA, CH-6900 Lugano, Switzerland

来源：

AUTONOMOUS ROBOTS | 1999年 / 7卷 / 01期

关键词：

reinforcement learning; CMAC; world models; simulated soccer; Q(lambda); evolutionary computation; PIPE;

D O I：

10.1023/A:1008921914343

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We use reinforcement learning (RL) to compute strategies for multiagent soccer teams. RL may profit significantly from world models (WMs) estimating state transition probabilities and rewards. In high-dimensional, continuous input spaces, however, learning accurate WMs is intractable. Here we show that incomplete WMs can help to quickly find good action selection policies. Our approach is based on a novel combination of CMACs and prioritized sweeping-like algorithms. Variants thereof outperform both Q(lambda)-learning with CMACs and the evolutionary method Probabilistic Incremental Program Evolution (PIPE) which performed best in previous comparisons.

引用

页码：77 / 88

页数：12

共 38 条

[21]

SALUSTOWICZ RP, 1997, LECT NOTES COMPUTER, V1327, P769

[22]

SALUSTOWICZ RP, 1997, P 4 INT C NEUR INF P, P502

[23]

SAMUEL AL, 1959, IBM J RES DEV, V3, P210

[24]

SANTAMARIA JC, 1996, 96088 CIONS GEORG I

[25] Shifting inductive bias with success-story algorithm, adaptive Levin search, and incremental self-improvement [J].

Schmidhuber, J ;

Zhao, JY ;

Wiering, M .

MACHINE LEARNING, 1997, 28 (01) :105-130

[26]

SCHMIDHUBER J, 1995, FKI19894 U MUNCH

[27]

SCHMIDHUBER J, 1997, LEARNING LEARN, P293

[28]

Singh SP, 1996, MACH LEARN, V22, P123, DOI 10.1007/BF00114726

[29]

Sutton R., 1988, Reinforcement Learning: An Introduction

[30]

Sutton R. S., 1988, Machine Learning, V3, P9, DOI 10.1023/A:1022633531479

← 1 2 3 4 →