Making friends on the fly: Cooperating with new teammates

被引:53
作者
Barrett, Samuel [1 ,6 ]
Rosenfeld, Avi [2 ]
Kraus, Sarit [3 ,4 ]
Stone, Peter [5 ]
机构
[1] Cogitai Inc, Anaheim, CA 92808 USA
[2] Jerusalem Coll Technol, Dept Ind Engn, IL-9116001 Jerusalem, Israel
[3] Bar Ilan Univ, Dept Comp Sci, IL-5290002 Ramat Gan, Israel
[4] Univ Maryland, Inst Adv Comp Studies, College Pk, MD 20742 USA
[5] Univ Texas Austin, Dept Comp Sci, Austin, TX 78712 USA
[6] Univ Texas Austin, Austin, TX 78712 USA
基金
美国国家科学基金会; 以色列科学基金会;
关键词
Ad hoc teamwork; Multiagent systems; Multiagent cooperation; Reinforcement learning; Pursuit domain; RoboCup soccer;
D O I
10.1016/j.artint.2016.10.005
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Robots are being deployed in an increasing variety of environments for longer periods of time. As the number of robots grows, they will increasingly need to interact with other robots. Additionally, the number of companies and research laboratories producing these robots is increasing, leading to the situation where these robots may not share a common communication or coordination protocol. While standards for coordination and communication may be created, we expect that robots will need to additionally reason intelligently about their teammates with limited information. This problem motivates the area of ad hoc teamwork in which an agent may potentially cooperate with a variety of teammates in order to achieve a shared goal. This article focuses on a limited version of the ad hoc teamwork problem in which an agent knows the environmental dynamics and has had past experiences with other teammates, though these experiences may not be representative of the current teammates. To tackle this problem, this article introduces a new general-purpose algorithm, PLASTIC, that reuses knowledge learned from previous teammates or provided by experts to quickly adapt to new teammates. This algorithm is instantiated in two forms: 1) PLASTIC-Model - which builds models of previous teammates' behaviors and plans behaviors online using these models and 2) PLASTIC-Policy - which learns policies for cooperating with previous teammates and selects among these policies online. We evaluate PLASTIC on two benchmark tasks: the pursuit domain and robot soccer in the RoboCup 2D simulation domain. Recognizing that a key requirement of ad hoc teamwork is adaptability to previously unseen agents, the tests use more than 40 previously unknown teams on the first task and 7 previously unknown teams on the second. While PLASTIC assumes that there is some degree of similarity between the current and past teammates' behaviors, no steps are taken in the experimental setup to make sure this assumption holds. The teammates were created by a variety of independent developers and were not designed to share any similarities. Nonetheless, the results show that PLASTIC was able to identify and exploit similarities between its current and past teammates' behaviors, allowing it to quickly adapt to new teammates. (C) 2016 Elsevier B.V. All rights reserved.
引用
收藏
页码:132 / 171
页数:40
相关论文
共 78 条
[1]  
Akiyama H., 2010, AGENT2D BASE CODE RE
[2]  
Albus J. S., 1975, Transactions of the ASME. Series G, Journal of Dynamic Systems, Measurement and Control, V97, P220, DOI 10.1115/1.3426922
[3]  
ALBUS J S, 1971, Mathematical Biosciences, V10, P25, DOI 10.1016/0025-5564(71)90051-4
[4]   An automatic approach to extract goal plans from soccer simulated matches [J].
Almeida, Fernando ;
Abreu, Pedro Henriques ;
Lau, Nuno ;
Reis, Luis Paulo .
SOFT COMPUTING, 2013, 17 (05) :835-848
[5]  
[Anonymous], P 25 INT C MACH LEAR
[6]  
[Anonymous], P 27 INT C MACH LEAR
[7]  
[Anonymous], 2000, P 17 INT C MACH LEAR
[8]  
[Anonymous], 2013, AAMAS
[9]  
[Anonymous], IJCAI
[10]  
[Anonymous], P 9 INT C AUT AG MUL