Parametrized Stochastic Multi-armed Bandits with Binary Rewards

被引：0

作者：

Jiang, Chong ^{[1
]}

Srikant, R. ^{[1
]}

机构：

[1] Univ Illinois, Coordinated Sci Lab, 1101 W Springfield Ave, Urbana, IL 61801 USA

来源：

2011 AMERICAN CONTROL CONFERENCE | 2011年

关键词：

EFFICIENT ALLOCATION RULES; MULTIPLE PLAYS;

D O I：

暂无

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In this paper, we consider the problem of multiarmed bandits with a large number of correlated arms. We assume that the arms have Bernoulli distributed rewards, independent across time, where the probabilities of success are parametrized by known attribute vectors for each arm, as well as an unknown preference vector, each of dimension n. For this model, we seek an algorithm with a total regret that is sub-linear in time and independent of the number of arms. We present such an algorithm, which we call the Three-phase Algorithm, and analyze its performance. We show an upper bound on the total regret which applies uniformly in time. The asymptotics of this bound show that for any f is an element of omega(log(T)), the total regret can be made to be O(n . f(T)), independent of the number of arms.

引用

页码：119 / 124

页数：6

共 50 条

[1] Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
Lee, Kyungjae
Yang, Hongjun
Lim, Sungbin
Oh, Songhwai
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
[2] Trading off Rewards and Errors in Multi-Armed Bandits
Erraqabi, Akram
Lazaric, Alessandro
Valko, Michal
Brunskill, Emma
Liu, Yun-En
ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 709 - 717
[3] Multi-Armed Bandits With Self-Information Rewards
Weinberger, Nir
Yemini, Michal
IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (11) : 7160 - 7184
[4] Stochastic Multi-Armed Bandits with Control Variates
Verma, Arun
Hanawal, Manjesh K.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[5] Stochastic Multi-armed Bandits in Constant Space
Liau, David
Price, Eric
Song, Zhao
Yang, Ger
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
[6] The value of information in multi-armed bandits with exponentially distributed rewards
Ryzhov, Ilya O.
Powell, Warren B.
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2011, 4 : 1363 - 1372
[7] Multi-armed Bandits with Generalized Temporally-Partitioned Rewards
van den Broek, Ronald C.
Litjens, Rik
Sagis, Tobias
Verbeeke, Nina
Gajane, Pratik
ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT I, IDA 2024, 2024, 14641 : 41 - 52
[8] Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints
Xu, Huanle
Liu, Yang
Lau, Wing Cheong
Li, Rui
PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2554 - 2560
[9] Stochastic Multi-Armed Bandits with Non-Stationary Rewards Generated by a Linear Dynamical System
Gornet, Jonathan
Hosseinzadeh, Mehdi
Sinopoli, Bruno
2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 1460 - 1465
[10] Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions
Lancewicki, Tal
Segal, Shahar
Koren, Tomer
Mansour, Yishay
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139

← 1 2 3 4 5 →