An Exploration Strategy Facing Non-Stationary Agents

被引：0

作者：

Hernandez-Leal, Pablo ^{[1
]}

Zhan, Yusen ^{[2
]}

Taylor, Matthew E. ^{[2
]}

Enrique Sucar, L. ^{[3
]}

de Cote, Enrique Munoz ^{[3
,4
,5
,6
]}

机构：

[1] Ctr Wiskunde & Informat, Amsterdam, Netherlands

[2] Washington State Univ, Pullman, WA 99164 USA

[3] INAOE, Puebla, Mexico

[4] PROWLER Io Ltd, Puebla, Mexico

[5] INAOE, Cambridge, England

[6] PROWLER Io Ltd, Cambridge, England

来源：

AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS | 2017年

基金：

欧盟地平线“2020”;

关键词：

Exploration; non-stationary environments; repeated games;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. This work investigates how to design exploration strategies in non-stationary and adversarial environments. Our experimental setting uses a two agents strategic interaction scenario, where the opponent switches between different behavioral patterns. The agent's objective is to learn a model of the opponent's strategy to act optimally, despite non-determinism and stochasticity. Our contribution is twofold. First, we present drift exploration as a strategy for switch detection. Second, we propose a new algorithm called R-max# that reasons and acts in terms of two objectives: 1) to maximize utilities in the short term while learning and 2) eventually explore implicitly looking for opponent behavioral changes. We provide theoretical results showing that R-max# is guaranteed to detect the opponent's switch and learn a new model in terms of finite sample complexity.

引用

页码：922 / 923

页数：2

共 4 条

[1] Multiagent learning using a variable learning rate [J].