Power-of-2-Arms for Bandit Learning With Switching Costs

被引：4

作者：

Shi, Ming ^{[1
]}

Lin, Xiaojun ^{[1
]}

Jiao, Lei ^{[2
]}

机构：

[1] Purdue Univ, W Lafayette, IN 47907 USA

[2] Univ Oregon, Eugene, OR 97403 USA

来源：

PROCEEDINGS OF THE 2022 THE TWENTY-THIRD INTERNATIONAL SYMPOSIUM ON THEORY, ALGORITHMIC FOUNDATIONS, AND PROTOCOL DESIGN FOR MOBILE NETWORKS AND MOBILE COMPUTING, MOBIHOC 2022 | 2022年

关键词：

Bandit learning; switching costs; regret analysis; edge computing with artificial intelligence; MULTIARMED BANDITS;

D O I：

10.1145/3492866.3549720

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Motivated by edge computing with artificial intelligence, in this paper we study a bandit-learning problem with switching costs. Existing results in the literature either incur Theta(T-2/3) regret with bandit feedback, or rely on free full-feedback in order to reduce the regret to O(root T). In contrast, we expand our study to incorporate two new factors. First, full feedback could incur a cost. Second, the player may choose 2 (or more) arms at a time, in which case she is free to use any one of the chosen arms to calculate loss, and switching costs are incurred only when she changes the set of chosen arms. For the setting where the player pulls only one arm at a time, our new regret lower-bound shows that, even when costly full-feedback is added, the Theta(T-2/3) regret still cannot be improved. However, the dependence on the number of arms may be improved when the full-feedback cost is small. In contrast, for the setting where the player can choose 2 (or more) arms at a time, we provide a novel online learning algorithm that achieves a lower O(root T) regret. Further, our new algorithm does not need any full feedback at all. This sharp difference therefore reveals the surprising power of choosing 2 (or more) arms for this type of bandit-learning problems with switching costs. Both our new algorithm and regret analysis involve several new ideas, which may be of independent interest.

引用

页码：131 / 140

页数：10

共 20 条

[1] [Anonymous], 2006, Cowles Foundation Discussion Paper
[2] Arora ADT12 Raman, 2012, P 29 INT COF INT C M, P1747
[3] Arora R, 2019, ADV NEUR IN, V32
[4] Auer P, 2003, SIAM J COMPUT, V32, P48, DOI 10.1137/S0097539701398375
[5] Blum A, 2007, ALGORITHMIC GAME THEORY, P79
[6] Deep Learning With Edge Computing: A Review
Chen, Jiasi
Ran, Xukan
[J]. PROCEEDINGS OF THE IEEE, 2019, 107 (08) : 1655 - 1674
[7] Chi-Chih Yao A., 1977, 18th Annual Symposium on Foundations of Computer Science, P222
[8] Combes Richard, 2015, Advances in Neural Information Processing Systems, P2107
[9] Cover TM., 2006, ELEMENTS INFORM THEO, V2
[10] Bandits with Switching Costs: T2/3 Regret
Dekel, Ofer
Ding, Jian
Koren, Tomer
Peres, Yuval
[J]. STOC'14: PROCEEDINGS OF THE 46TH ANNUAL 2014 ACM SYMPOSIUM ON THEORY OF COMPUTING, 2014, : 459 - 467

← 1 2 →