Multi-armed bandits with switching penalties

被引:35
|
作者
Asawa, M
Teneketzis, D
机构
[1] Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor
基金
美国国家科学基金会;
关键词
D O I
10.1109/9.486316
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The multi-armed bandit problem with switching penalties (switching cost and switching delays) is investigated. It is shown that under an optimal policy, decisions about the processor allocation need to be made only at stopping times that achieve an appropriate index, the well-known ''Gittins index'' or a ''switching index'' that is defined for switching cost and switching delays, An algorithm for the computation of the ''switching index'' is presented. Furthermore, sufficient conditions for optimality of allocation strategies, based on limited look-ahead techniques, are established, These conditions together with the above-mentioned feature of optimal scheduling policies simplify the search for an optimal allocation policy. For a special class of multi-armed bandits (scheduling of parallel queues with switching penalties and no arrivals), it is shown that the aforementioned property of optimal policies is sufficient to determine an optimal allocation strategy, In general, the determination of optimal allocation policies remains a difficult and challenging task.
引用
收藏
页码:328 / 348
页数:21
相关论文
共 50 条
  • [31] Federated Multi-armed Bandits with Personalization
    Shi, Chengshuai
    Shen, Cong
    Yang, Jing
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [32] LEVY BANDITS: MULTI-ARMED BANDITS DRIVEN BY LEVY PROCESSES
    Kaspi, Haya
    Mandelbaum, Avi
    ANNALS OF APPLIED PROBABILITY, 1995, 5 (02): : 541 - 565
  • [33] Multi-armed linear bandits with latent biases
    Kang, Qiyu
    Tay, Wee Peng
    She, Rui
    Wang, Sijie
    Liu, Xiaoqian
    Yang, Yuan-Rui
    Information Sciences, 2024, 660
  • [34] Online Multi-Armed Bandits with Adaptive Inference
    Dimakopoulou, Maria
    Ren, Zhimei
    Zhou, Zhengyuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [35] Multi-Armed Bandits for Adaptive Constraint Propagation
    Balafrej, Amine
    Bessiere, Christian
    Paparrizou, Anastasia
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 290 - 296
  • [36] Successive Reduction of Arms in Multi-Armed Bandits
    Gupta, Neha
    Granmo, Ole-Christoffer
    Agrawala, Ashok
    RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XXVIII: INCORPORATING APPLICATIONS AND INNOVATIONS IN INTELLIGENT SYSTEMS XIX, 2011, : 181 - +
  • [37] Quantum greedy algorithms for multi-armed bandits
    Hiroshi Ohno
    Quantum Information Processing, 22
  • [38] Algorithms for Differentially Private Multi-Armed Bandits
    Tossou, Aristide C. Y.
    Dimitrakakis, Christos
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2087 - 2093
  • [39] Combinatorial Multi-armed Bandits for Resource Allocation
    Zuo, Jinhang
    Joe-Wong, Carlee
    2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
  • [40] Quantum Reinforcement Learning for Multi-Armed Bandits
    Liu, Yi-Pei
    Li, Kuo
    Cao, Xi
    Jia, Qing-Shan
    Wang, Xu
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 5675 - 5680