Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals

被引:0
作者
Heyden, Marco [1 ]
Arzamasov, Vadim [1 ]
Fouche, Edouard [1 ]
Boehm, Klemens [1 ]
机构
[1] Karlsruhe Inst Technol, Karlsruhe, Germany
来源
PROCEEDINGS OF THE 30TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2024 | 2024年
关键词
Budgeted multi-armed bandits; Multi-armed bandits; Decision making under uncertainty; Online decision making;
D O I
10.1145/3637528.3671833
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the stochastic Budgeted Multi-Armed Bandit (MAB) problem, where a player chooses from K arms with unknown expected rewards and costs. The goal is to maximize the total reward under a budget constraint. A player thus seeks to choose the arm with the highest reward-cost ratio as often as possible. Current approaches for this problem have several issues, which we illustrate. To overcome them, we propose a new upper confidence bound (UCB) sampling policy, omega-UCB, that uses asymmetric confidence intervals. These intervals scale with the distance between the sample mean and the bounds of a random variable, yielding a more accurate and tight estimation of the reward-cost ratio compared to our competitors. We show that our approach has sublinear instance-dependent regret in general and logarithmic regret for parameter rho >= 1, and that it outperforms existing policies consistently in synthetic and real settings.
引用
收藏
页码:1073 / 1084
页数:12
相关论文
共 25 条
[1]  
Ardagna D., 2011, WWW, P177
[2]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[3]   Stochastic bandits for multi-platform budget optimization in online advertising [J].
Avadhanula, V ;
Colini-Baldeschi, R. ;
Leonardi, S. ;
Sankararaman, K. A. ;
Schrijvers, O. .
PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE 2021 (WWW 2021), 2021, :2805-2817
[4]   Bandits with Knapsacks (Extended Abstract) [J].
Badanidiyuru, Ashwinkumar ;
Kleinberg, Robert ;
Slivkins, Aleksandrs .
2013 IEEE 54TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2013, :207-216
[5]   A better bound on the variance [J].
Bhatia, R ;
Davis, C .
AMERICAN MATHEMATICAL MONTHLY, 2000, 107 (04) :353-357
[6]  
Borgs C., 2007, P 16 INT C WORLD WID, P531, DOI DOI 10.1145/1242572.1242644
[7]  
Cayci S, 2020, PR MACH LEARN RES, V108, P4388
[8]  
Cayci S, 2019, P ACM MEAS ANAL COMP, V3, DOI [10.1145/3326158, 10.1145/3341617.3326158]
[9]  
Das D., 2022, AAMAS, P345
[10]  
Ding W., 2013, AAAI-13, P232, DOI 10.1609/aaai.v27i1.8637