Self-Unaware Adversarial Multi-Armed Bandits With Switching Costs

被引:4
|
作者
Alipour-Fanid, Amir [1 ]
Dabaghchian, Monireh [2 ]
Zeng, Kai [3 ]
机构
[1] West Virginia Univ, Lane Dept Comp Sci & Elect Engn, Morgantown, WV 26506 USA
[2] Morgan State Univ, Dept Comp Sci, Baltimore, MD 21251 USA
[3] George Mason Univ, Dept Elect & Comp Engn, Fairfax, VA 22030 USA
基金
美国国家科学基金会;
关键词
Arm switching costs; multi-armed bandits (MABs); online learning; regret minimization; self-unaware player/learner;
D O I
10.1109/TNNLS.2021.3110194
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study a family of adversarial (a.k.a. nonstochastic) multi-armed bandit (MAB) problems, wherein not only the player cannot observe the reward on the played arm (self-unaware player) but also it incurs switching casts when shifting to another arm. We study two cases: In Case 1, at each round, the player is able to either play or observe the chosen arm, but not both. In Case 2, the player can choose an arm to play and, at the same round, choose another arm to observe. In both cases, the player incurs a cost for consecutive arm switching due to playing or observing the arms. We propose two novel online learning-based algorithms each addressing one of the aforementioned MAB problems. We theoretically prove that the proposed algorithms for Case 1 and Case 2 achieve sublinear regret of O (4 root KT3 ln K) and O(3 root(K - 1)T-2 ln K), respectively, where the latter regret bound is order-optimal in time, K is the number of arms, and T is the total number of rounds. In Case 2, we extend the player's capability to multiple m > 1 observations and show that more observations do not necessarily improve the regret bound due to incurring switching costs. However, we derive an upper bound for switching cost as c <= 1/3 root m(2) for which the regret bound is improved as the number of observations increases. Finally, through this study, we found that a generalized version of our approach gives an interesting sublinear regret upper bound result of (O) over tilde (Ts+1/s+2) for any self-unaware bandit player with s number of binary decision dilemma before taking the action. To further validate and complement the theoretical findings, we conduct extensive performance evaluations over synthetic data constructed by nonstochastic MAB environment simulations and wireless spectrum measurement data collected in a real-world experiment.
引用
收藏
页码:2908 / 2922
页数:15
相关论文
共 50 条
  • [1] Multi-armed Bandits with Metric Switching Costs
    Guha, Sudipto
    Munagala, Kamesh
    AUTOMATA, LANGUAGES AND PROGRAMMING, PT II, PROCEEDINGS, 2009, 5556 : 496 - +
  • [2] Simple fixes that accommodate switching costs in multi-armed bandits
    Teymourian, Ehsan
    Yang, Jian
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2025, 320 (03) : 616 - 627
  • [3] Multi-armed bandits with switching penalties
    Asawa, M
    Teneketzis, D
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1996, 41 (03) : 328 - 348
  • [4] Multi-Armed Bandits with Metric Movement Costs
    Koren, Tomer
    Livni, Roi
    Mansour, Yishay
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [5] Adapting to Delays and Data in Adversarial Multi-Armed Bandits
    Gyorgy, Andras
    Joulani, Pooria
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [6] Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits
    Wang, Zhiwei
    Wang, Huazheng
    Wang, Hongning
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15770 - 15777
  • [7] On Kernelized Multi-armed Bandits
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [8] Regional Multi-Armed Bandits
    Wang, Zhiyang
    Zhou, Ruida
    Shen, Cong
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [9] Multi-armed Bandits with Compensation
    Wang, Siwei
    Huang, Longbo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [10] Federated Multi-Armed Bandits
    Shi, Chengshuai
    Shen, Cong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9603 - 9611