An Exploration Strategy Facing Non-Stationary Agents

被引:0
作者
Hernandez-Leal, Pablo [1 ]
Zhan, Yusen [2 ]
Taylor, Matthew E. [2 ]
Enrique Sucar, L. [3 ]
de Cote, Enrique Munoz [3 ,4 ,5 ,6 ]
机构
[1] Ctr Wiskunde & Informat, Amsterdam, Netherlands
[2] Washington State Univ, Pullman, WA 99164 USA
[3] INAOE, Puebla, Mexico
[4] PROWLER Io Ltd, Puebla, Mexico
[5] INAOE, Cambridge, England
[6] PROWLER Io Ltd, Cambridge, England
来源
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS | 2017年
基金
欧盟地平线“2020”;
关键词
Exploration; non-stationary environments; repeated games;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The success or failure of any learning algorithm is partially due to the exploration strategy it exerts. However, most exploration strategies assume that the environment is stationary and non-strategic. This work investigates how to design exploration strategies in non-stationary and adversarial environments. Our experimental setting uses a two agents strategic interaction scenario, where the opponent switches between different behavioral patterns. The agent's objective is to learn a model of the opponent's strategy to act optimally, despite non-determinism and stochasticity. Our contribution is twofold. First, we present drift exploration as a strategy for switch detection. Second, we propose a new algorithm called R-max# that reasons and acts in terms of two objectives: 1) to maximize utilities in the short term while learning and 2) eventually explore implicitly looking for opponent behavioral changes. We provide theoretical results showing that R-max# is guaranteed to detect the opponent's switch and learn a new model in terms of finite sample complexity.
引用
收藏
页码:922 / 923
页数:2
相关论文
共 4 条
[1]   Multiagent learning using a variable learning rate [J].
Bowling, M ;
Veloso, M .
ARTIFICIAL INTELLIGENCE, 2002, 136 (02) :215-250
[2]   R-MAX - A general polynomial time algorithm for near-optimal reinforcement learning [J].
Brafman, RI ;
Tennenholtz, M .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (02) :213-231
[3]  
Hernandez-Leal P., 2016, AUTONOMOUS AGENTS MU
[4]   A framework for learning and planning against switching strategies in repeated games [J].
Hernandez-Leal, Pablo ;
Munoz de Cote, Enrique ;
Enrique Sucar, L. .
CONNECTION SCIENCE, 2014, 26 (02) :103-122