Power-of-2-Arms for Bandit Learning With Switching Costs

被引:4
作者
Shi, Ming [1 ]
Lin, Xiaojun [1 ]
Jiao, Lei [2 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
[2] Univ Oregon, Eugene, OR 97403 USA
来源
PROCEEDINGS OF THE 2022 THE TWENTY-THIRD INTERNATIONAL SYMPOSIUM ON THEORY, ALGORITHMIC FOUNDATIONS, AND PROTOCOL DESIGN FOR MOBILE NETWORKS AND MOBILE COMPUTING, MOBIHOC 2022 | 2022年
关键词
Bandit learning; switching costs; regret analysis; edge computing with artificial intelligence; MULTIARMED BANDITS;
D O I
10.1145/3492866.3549720
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Motivated by edge computing with artificial intelligence, in this paper we study a bandit-learning problem with switching costs. Existing results in the literature either incur Theta(T-2/3) regret with bandit feedback, or rely on free full-feedback in order to reduce the regret to O(root T). In contrast, we expand our study to incorporate two new factors. First, full feedback could incur a cost. Second, the player may choose 2 (or more) arms at a time, in which case she is free to use any one of the chosen arms to calculate loss, and switching costs are incurred only when she changes the set of chosen arms. For the setting where the player pulls only one arm at a time, our new regret lower-bound shows that, even when costly full-feedback is added, the Theta(T-2/3) regret still cannot be improved. However, the dependence on the number of arms may be improved when the full-feedback cost is small. In contrast, for the setting where the player can choose 2 (or more) arms at a time, we provide a novel online learning algorithm that achieves a lower O(root T) regret. Further, our new algorithm does not need any full feedback at all. This sharp difference therefore reveals the surprising power of choosing 2 (or more) arms for this type of bandit-learning problems with switching costs. Both our new algorithm and regret analysis involve several new ideas, which may be of independent interest.
引用
收藏
页码:131 / 140
页数:10
相关论文
共 20 条
  • [1] [Anonymous], 2006, Cowles Foundation Discussion Paper
  • [2] Arora ADT12 Raman, 2012, P 29 INT COF INT C M, P1747
  • [3] Arora R, 2019, ADV NEUR IN, V32
  • [4] Auer P, 2003, SIAM J COMPUT, V32, P48, DOI 10.1137/S0097539701398375
  • [5] Blum A, 2007, ALGORITHMIC GAME THEORY, P79
  • [6] Deep Learning With Edge Computing: A Review
    Chen, Jiasi
    Ran, Xukan
    [J]. PROCEEDINGS OF THE IEEE, 2019, 107 (08) : 1655 - 1674
  • [7] Chi-Chih Yao A., 1977, 18th Annual Symposium on Foundations of Computer Science, P222
  • [8] Combes Richard, 2015, Advances in Neural Information Processing Systems, P2107
  • [9] Cover TM., 2006, ELEMENTS INFORM THEO, V2
  • [10] Bandits with Switching Costs: T2/3 Regret
    Dekel, Ofer
    Ding, Jian
    Koren, Tomer
    Peres, Yuval
    [J]. STOC'14: PROCEEDINGS OF THE 46TH ANNUAL 2014 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2014, : 459 - 467