Gradient-free Online Learning in Games with Delayed Rewards

被引:0
作者
Heliou, Amelie [1 ]
Mertikopoulos, Panayotis [1 ,2 ]
Zhou, Zhengyuan [3 ,4 ]
机构
[1] Criteo AI Lab, Ann Arbor, MI 48104 USA
[2] Univ Grenoble Ales, LIG, Grenoble INP, Inria,CNRS, F-38000 Grenoble, France
[3] NYU, Stern Sch Business, New York, NY 10003 USA
[4] IBM Res, Armonk, NY USA
来源
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 119 | 2020年 / 119卷
关键词
OPTIMIZATION;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Motivated by applications to online advertising and recommender systems, we consider a game-theoretic model with delayed rewards and asynchronous, payoff-based feedback. In contrast to previous work on delayed multi-armed bandits, we focus on multi-player games with continuous action spaces, and we examine the long-run behavior of strategic agents that follow a no-regret learning policy (but are otherwise oblivious to the game being played, the objectives of their opponents, etc.). To account for the lack of a consistent stream of information (for instance, rewards can arrive out of order, with an a priori unbounded delay, etc.), we introduce a gradient-free learning policy where payoff information is placed in a priority queue as it arrives. In this general context, we derive new bounds for the agents' regret; furthermore, under a standard diagonal concavity assumption, we show that the induced sequence of play converges to Nash equilibrium (NE) with probability 1, even if the delay between choosing an action and receiving the corresponding reward is unbounded.
引用
收藏
页数:10
相关论文
共 43 条
[1]  
[Anonymous], 2016, NIPS 16
[2]  
[Anonymous], 2007, ALGORITHMIC GAME THE
[3]  
[Anonymous], 2018, LEARNING MINIMAL INF
[4]  
[Anonymous], 2018, P 32 INT C NEUR INF
[5]  
[Anonymous], 1991, Game Theory
[6]  
[Anonymous], 2015, Handbook of Game Theory
[7]  
[Anonymous], 2018, P 35 INT C MACH LEAR
[8]  
Bauschke H. H., 2017, CMS Books in Mathematics/Ouvrages de Mathematiques de la SMC, V2nd, DOI [DOI 10.1007/978-3-319-32349-7, 10.1007/978-3-319-48311-5, 10.1007/978-3]
[9]  
Bistritz I., 2019, ADV NEURAL INFORM PR, P11345
[10]  
Bubeck S., 2017, STOC 17