Reinforcement learning soccer teams with incomplete world models

被引:19
作者
Wiering, M [1 ]
Salustowicz, R [1 ]
Schmidhuber, J [1 ]
机构
[1] IDSIA, CH-6900 Lugano, Switzerland
关键词
reinforcement learning; CMAC; world models; simulated soccer; Q(lambda); evolutionary computation; PIPE;
D O I
10.1023/A:1008921914343
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We use reinforcement learning (RL) to compute strategies for multiagent soccer teams. RL may profit significantly from world models (WMs) estimating state transition probabilities and rewards. In high-dimensional, continuous input spaces, however, learning accurate WMs is intractable. Here we show that incomplete WMs can help to quickly find good action selection policies. Our approach is based on a novel combination of CMACs and prioritized sweeping-like algorithms. Variants thereof outperform both Q(lambda)-learning with CMACs and the evolutionary method Probabilistic Incremental Program Evolution (PIPE) which performed best in previous comparisons.
引用
收藏
页码:77 / 88
页数:12
相关论文
共 38 条
[1]  
ALBUS JS, 1975, DYNAMIC SYSTEMS MEAS, P220
[2]  
[Anonymous], 1961, Adaptive Control Processes: a Guided Tour, DOI DOI 10.1515/9781400874668
[3]  
[Anonymous], 1971, THESIS
[4]  
Baluja S., 1995, Machine Learning. Proceedings of the Twelfth International Conference on Machine Learning, P38
[5]   NEURONLIKE ADAPTIVE ELEMENTS THAT CAN SOLVE DIFFICULT LEARNING CONTROL-PROBLEMS [J].
BARTO, AG ;
SUTTON, RS ;
ANDERSON, CW .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS, 1983, 13 (05) :834-846
[6]  
Bertsekas D. P., 1996, Neuro Dynamic Programming, V1st
[7]  
Cramer N.L., 1985, Proceedings of the First International Conference on Genetic Algorithms and their Applications (ICGA'85), P183, DOI 10.4324/9781315799674-19
[8]  
DICKMANNS D, 1986, GENETISCHE ALGORITHM
[9]  
HOLLAND JH, 1975, ADAPTATION NATURAL A
[10]  
Kaelbling, 1993, LEARNING EMBEDDED SY, DOI 10.7551/mitpress/4168.001.0001