Self-Unaware Adversarial Multi-Armed Bandits With Switching Costs

被引：4

作者：

Alipour-Fanid, Amir ^{[1
]}

Dabaghchian, Monireh ^{[2
]}

Zeng, Kai ^{[3
]}

机构：

[1] West Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA

[2] Morgan State Univ, Dept Comp Sci, Baltimore, MD 21251 USA

[3] George Mason Univ, Dept Elect & Comp Engn, Fairfax, VA 22030 USA

来源：

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS | 2023年 / 34卷 / 06期

基金：

美国国家科学基金会;

关键词：

Arm switching costs; multi-armed bandits (MABs); online learning; regret minimization; self-unaware player/learner;

D O I：

10.1109/TNNLS.2021.3110194

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study a family of adversarial (a.k.a. nonstochastic) multi-armed bandit (MAB) problems, wherein not only the player cannot observe the reward on the played arm (self-unaware player) but also it incurs switching casts when shifting to another arm. We study two cases: In Case 1, at each round, the player is able to either play or observe the chosen arm, but not both. In Case 2, the player can choose an arm to play and, at the same round, choose another arm to observe. In both cases, the player incurs a cost for consecutive arm switching due to playing or observing the arms. We propose two novel online learning-based algorithms each addressing one of the aforementioned MAB problems. We theoretically prove that the proposed algorithms for Case 1 and Case 2 achieve sublinear regret of O (4 root KT3 ln K) and O(3 root(K - 1)T-2 ln K), respectively, where the latter regret bound is order-optimal in time, K is the number of arms, and T is the total number of rounds. In Case 2, we extend the player's capability to multiple m > 1 observations and show that more observations do not necessarily improve the regret bound due to incurring switching costs. However, we derive an upper bound for switching cost as c <= 1/3 root m(2) for which the regret bound is improved as the number of observations increases. Finally, through this study, we found that a generalized version of our approach gives an interesting sublinear regret upper bound result of (O) over tilde (Ts+1/s+2) for any self-unaware bandit player with s number of binary decision dilemma before taking the action. To further validate and complement the theoretical findings, we conduct extensive performance evaluations over synthetic data constructed by nonstochastic MAB environment simulations and wireless spectrum measurement data collected in a real-world experiment.

引用

页码：2908 / 2922

页数：15

共 50 条

[1] Multi-armed Bandits with Metric Switching Costs
Guha, Sudipto
Munagala, Kamesh
AUTOMATA, LANGUAGES AND PROGRAMMING, PT II, PROCEEDINGS, 2009, 5556 : 496 - +
[2] Simple fixes that accommodate switching costs in multi-armed bandits
Teymourian, Ehsan
Yang, Jian
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2025, 320 (03) : 616 - 627
[3] Multi-armed bandits with switching penalties
Asawa, M
Teneketzis, D
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1996, 41 (03) : 328 - 348
[4] Multi-Armed Bandits with Metric Movement Costs
Koren, Tomer
Livni, Roi
Mansour, Yishay
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
[5] Adapting to Delays and Data in Adversarial Multi-Armed Bandits
Gyorgy, Andras
Joulani, Pooria
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[6] Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits
Wang, Zhiwei
Wang, Huazheng
Wang, Hongning
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15770 - 15777
[7] On Kernelized Multi-armed Bandits
Chowdhury, Sayak Ray
Gopalan, Aditya
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[8] Regional Multi-Armed Bandits
Wang, Zhiyang
Zhou, Ruida
Shen, Cong
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
[9] Multi-armed Bandits with Compensation
Wang, Siwei
Huang, Longbo
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
[10] Federated Multi-Armed Bandits
Shi, Chengshuai
Shen, Cong
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9603 - 9611

← 1 2 3 4 5 →