KL-UCB-Based Policy for Budgeted Multi-Armed Bandits with Stochastic Action Costs

被引:3
|
作者
Watanabe, Ryo [1 ]
Komiyama, Junpei [2 ]
Nakamura, Atsuyoshi [1 ]
Kudo, Mineichi [1 ]
机构
[1] Hokkaido Univ, Grad Sch Informat Sci & Technol, Sapporo, Hokkaido 0600814, Japan
[2] Univ Tokyo, Inst Ind Sci, Tokyo 1538505, Japan
关键词
budgeted multi-armed bandits; asymptotically optimal policy; regret analysis; ALLOCATION;
D O I
10.1587/transfun.E100.A.2470
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
We study the budgeted multi-armed bandit problem with stochastic action costs. In this problem, a player not only receives a reward but also pays a cost for an action of his/her choice. The goal of the player is to maximize the cumulative reward he/she receives before the total cost exceeds the budget. In the classical multi-armed bandit problem, a policy called KL-UCB is known to perform well. We propose KL-UCB-SC, an extension of this policy for the budgeted bandit problem. We prove that KL-UCB-SC is asymptotically optimal for the case of Bernoulli costs and rewards. To the best of our knowledge, this is the first result that shows asymptotic optimality in the study of the budgeted bandit problem. In fact, our regret upper bound is at least four times better than that of BTS, the best known upper bound for the budgeted bandit problem. Moreover, an empirical simulation we conducted shows that the performance of a tuned variant of KL-UCB-SC is comparable to that of state-of-the-art policies such as PD-BwK and BTS.
引用
收藏
页码:2470 / 2486
页数:17
相关论文
共 50 条
  • [1] Thompson Sampling for Budgeted Multi-armed Bandits
    Xia, Yingce
    Li, Haifang
    Qin, Tao
    Yu, Nenghai
    Liu, Tie-Yan
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3960 - 3966
  • [2] UCB-SC: A Fast Variant of KL-UCB-SC for Budgeted Multi-Armed Bandit Problem
    Watanabe, Ryo
    Komiyama, Junpei
    Nakamura, Atsuyoshi
    Kudo, Mineichi
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2018, E101A (03): : 662 - 667
  • [3] Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals
    Heyden, Marco
    Arzamasov, Vadim
    Fouche, Edouard
    Boehm, Klemens
    PROCEEDINGS OF THE 30TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2024, 2024, : 1073 - 1084
  • [4] A Differentially Private Approach for Budgeted Combinatorial Multi-Armed Bandits
    Wang, Hengzhi
    Cui, Laizhong
    Wang, En
    Liu, Jiangchuan
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2025, 22 (01) : 424 - 439
  • [5] Multi-armed Bandits with Metric Switching Costs
    Guha, Sudipto
    Munagala, Kamesh
    AUTOMATA, LANGUAGES AND PROGRAMMING, PT II, PROCEEDINGS, 2009, 5556 : 496 - +
  • [6] Multi-Armed Bandits with Metric Movement Costs
    Koren, Tomer
    Livni, Roi
    Mansour, Yishay
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [7] Stochastic Multi-Armed Bandits with Control Variates
    Verma, Arun
    Hanawal, Manjesh K.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [8] Stochastic Multi-armed Bandits in Constant Space
    Liau, David
    Price, Eric
    Song, Zhao
    Yang, Ger
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [9] Budgeted Multi-Armed Bandit in Continuous Action Space
    Trovo, Francesco
    Paladino, Stefano
    Restelli, Marcello
    Gatti, Nicola
    ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, 285 : 560 - 568
  • [10] Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions
    Lancewicki, Tal
    Segal, Shahar
    Koren, Tomer
    Mansour, Yishay
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139