Bandit Learning in Concave N-Person Games

被引:0
作者
Bravo, Mario [1 ]
Leslie, David [2 ,3 ]
Mertikopoulos, Panayotis [4 ]
机构
[1] Univ Santiago Chile, Dept Matemat & Ciencia Computac, Santiago, Chile
[2] Univ Lancaster, Lancaster, England
[3] PROWLER Io, Cambridge, England
[4] Univ Grenoble Alpes, CNRS, INRIA, Grenoble INP,LIG, F-38000 Grenoble, France
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) | 2018年 / 31卷
关键词
OPTIMIZATION; DYNAMICS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper examines the long-run behavior of learning with bandit feedback in non-cooperative concave games. The bandit framework accounts for extremely low-information environments where the agents may not even know they are playing a game; as such, the agents' most sensible choice in this setting would be to employ a no-regret learning algorithm. In general, this does not mean that the players' behavior stabilizes in the long run: no-regret learning may lead to cycles, even with perfect gradient information. However, if a standard monotonicity condition is satisfied, our analysis shows that no-regret learning based on mirror descent with bandit feedback converges to Nash equilibrium with probability 1. We also derive an upper bound for the convergence rate of the process that nearly matches the best attainable rate for single-agent bandit stochastic optimization.
引用
收藏
页数:11
相关论文
共 35 条
  • [1] [Anonymous], 2018, MATH PROGRAMMING
  • [2] [Anonymous], 2003, INTRO LECT CONVEX OP
  • [3] [Anonymous], 2017, CONVEX ANAL MONOTONE, DOI [DOI 10.1007/978-3-319-48311-5, 10.1007/978-3-319-48311-5]
  • [4] [Anonymous], NIPS 04 P 18 ANN C N
  • [5] [Anonymous], OPTIMISTIC MIRROR DE
  • [6] [Anonymous], 2018, SODA 18 P 29 ANN ACM
  • [7] [Anonymous], 2018, LEARNING MINIMAL INF
  • [8] [Anonymous], COLT 10 P 23 ANN C L
  • [9] [Anonymous], 2013, P COLT
  • [10] [Anonymous], 1993, IEEE ACM T NETW