Algorithms for Adversarial Bandit Problems with Multiple Plays

被引:0
作者
Uchiya, Taishi [1 ]
Nakamura, Atsuyoshi [1 ]
Kudo, Mineichi [1 ]
机构
[1] Hokkaido Univ, Grad Sch Informat Sci & Technol, Kita Ku, Sapporo, Hokkaido 0600814, Japan
来源
ALGORITHMIC LEARNING THEORY, ALT 2010 | 2010年 / 6331卷
关键词
Multi-armed bandit problem; adversarial bandit problem; online learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Adversarial bandit problems studied by Auer et al. [4] are multi-armed bandit problems in which no stochastic assumption is made on the nature of the process generating the rewards for actions. In this paper, we extend their theories to the case where k(>= 1) distinct actions are selected at each time step. As algorithms to solve our problem, we analyze an extension of Exp3 [4] and an application of a bandit online linear optimization algorithm [1] in addition to two existing algorithms (Exp3, ComBand [5]) in terms of time and space efficiency and the regret for the best fixed action set. The extension of Exp3, called Exp3. M, performs best with respect to all the measures: it runs in O(K (log k + 1) time and O(K) space, and suffers at most O(root kTK log (K/k)) regret, where K is the number of possible actions and T is the number of iterations. The upper bound of the regret we proved for Exp3. M is an extension of that proved by Auer et al. for Exp3.
引用
收藏
页码:375 / 389
页数:15
相关论文
共 16 条
  • [1] Agrawal R., 1990, Stochastics and Stochastics Reports, V29, P437, DOI 10.1080/17442509008833627
  • [2] ASYMPTOTICALLY EFFICIENT ALLOCATION RULES FOR THE MULTIARMED BANDIT PROBLEM WITH MULTIPLE PLAYS .1. IID REWARDS
    ANANTHARAM, V
    VARAIYA, P
    WALRAND, J
    [J]. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1987, 32 (11) : 968 - 976
  • [3] [Anonymous], 2009, Competing in the dark: An efficient algorithm for bandit linear optimization
  • [4] [Anonymous], P 22 ANN C LEARN THE
  • [5] Auer P, 2003, SIAM J COMPUT, V32, P48, DOI 10.1137/S0097539701398375
  • [6] Dependent rounding and its applications to approximation algorithms
    Gandhi, Rajiv
    Khuller, Samir
    Parthasarathy, Srinivasan
    Srinivasan, Aravind
    [J]. JOURNAL OF THE ACM, 2006, 53 (03) : 324 - 360
  • [7] György A, 2007, J MACH LEARN RES, V8, P2369
  • [8] Kleinberg R., 2007, CS 683 LEARNING GAME
  • [9] KREIN M, 1940, STUD MATH, V9, P133
  • [10] Mahajan A., 2007, Foundations and Applications of Sensor Management, P121