Combining Policy Search with Planning in Multi-agent Cooperation

被引:0
作者
Ma, Jie [1 ]
Cameron, Stephen [1 ]
机构
[1] Univ Oxford, Comp Lab, Oxford OX1 3QD, England
来源
ROBOCUP 2008: ROBOT SOCCER WORLD CUP XII | 2009年 / 5399卷
关键词
Policy Search; Planning; Machine Learning; Multi-agent Systems; REINFORCEMENT; COORDINATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
It is cooperation that essentially differentiates multi-agent systems (MASs) from single-agent intelligence. In realistic MAS applications such as RoboCup, repeated work has shown that traditional machine learning (ML) approaches have difficulty mapping directly from cooperative, behaviours to actuator outputs. To overcome this problem, vertical layered architectures are commonly used to break cooperation down into behavioural layers: NIL has then been used to generate different low-level skills, and a planning mechanism added to create high-level cooperation. We propose, a novel method called Policy Search Planning (PSP); in which Policy Search is used to find all optimal policy for selecting plans from a plan pool. PSP extends an existing gradient-search method (GPOMDP) to a MAS domain. We demonstrate how PSP call be used in RoboCup Simulation, and our experimental results reveal robustness, adaptivity, and over other methods.
引用
收藏
页码:532 / 543
页数:12
相关论文
共 26 条
  • [1] ABERDEEN D, 2005, NEURAL INFORM PROCES
  • [2] Barto AndrewG., 1990, Neuronlike adaptive elements that can solve difficult learning control problems, P81
  • [3] Infinite-horizon policy-gradient estimation
    Baxter, J
    Bartlett, PL
    [J]. JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH, 2001, 15 : 319 - 350
  • [4] Baxter J., 1999, Direct gradient-based reinforcement learning: I. gradient estimation algorithms
  • [5] BUFFET O, 2007, INT C AUT PLANN SCHE
  • [6] BUFFET O, 2006, 5 INT PLANN COMP
  • [7] Shaping multi-agent systems with gradient reinforcement learning
    Buffet, Olivier
    Dutech, Alain
    Charpillet, Francois
    [J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2007, 15 (02) : 197 - 220
  • [8] Local strategy learning in networked multi-agent team formation
    Bulka, Blazej
    Gaston, Matthew
    desJardins, Marie
    [J]. AUTONOMOUS AGENTS AND MULTI-AGENT SYSTEMS, 2007, 15 (01) : 29 - 45
  • [9] Chen M., 2002, Robocup soccer server manual
  • [10] Fraser G, 2005, LECT NOTES COMPUT SC, V3276, P524