Parametrized Stochastic Multi-armed Bandits with Binary Rewards

被引:0
|
作者
Jiang, Chong [1 ]
Srikant, R. [1 ]
机构
[1] Univ Illinois, Coordinated Sci Lab, 1101 W Springfield Ave, Urbana, IL 61801 USA
关键词
EFFICIENT ALLOCATION RULES; MULTIPLE PLAYS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this paper, we consider the problem of multiarmed bandits with a large number of correlated arms. We assume that the arms have Bernoulli distributed rewards, independent across time, where the probabilities of success are parametrized by known attribute vectors for each arm, as well as an unknown preference vector, each of dimension n. For this model, we seek an algorithm with a total regret that is sub-linear in time and independent of the number of arms. We present such an algorithm, which we call the Three-phase Algorithm, and analyze its performance. We show an upper bound on the total regret which applies uniformly in time. The asymptotics of this bound show that for any f is an element of omega(log(T)), the total regret can be made to be O(n . f(T)), independent of the number of arms.
引用
收藏
页码:119 / 124
页数:6
相关论文
共 50 条
  • [1] Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
    Lee, Kyungjae
    Yang, Hongjun
    Lim, Sungbin
    Oh, Songhwai
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [2] Trading off Rewards and Errors in Multi-Armed Bandits
    Erraqabi, Akram
    Lazaric, Alessandro
    Valko, Michal
    Brunskill, Emma
    Liu, Yun-En
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 709 - 717
  • [3] Multi-Armed Bandits With Self-Information Rewards
    Weinberger, Nir
    Yemini, Michal
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2023, 69 (11) : 7160 - 7184
  • [4] Stochastic Multi-Armed Bandits with Control Variates
    Verma, Arun
    Hanawal, Manjesh K.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [5] Stochastic Multi-armed Bandits in Constant Space
    Liau, David
    Price, Eric
    Song, Zhao
    Yang, Ger
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [6] The value of information in multi-armed bandits with exponentially distributed rewards
    Ryzhov, Ilya O.
    Powell, Warren B.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2011, 4 : 1363 - 1372
  • [7] Multi-armed Bandits with Generalized Temporally-Partitioned Rewards
    van den Broek, Ronald C.
    Litjens, Rik
    Sagis, Tobias
    Verbeeke, Nina
    Gajane, Pratik
    ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT I, IDA 2024, 2024, 14641 : 41 - 52
  • [8] Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints
    Xu, Huanle
    Liu, Yang
    Lau, Wing Cheong
    Li, Rui
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2554 - 2560
  • [9] Stochastic Multi-Armed Bandits with Non-Stationary Rewards Generated by a Linear Dynamical System
    Gornet, Jonathan
    Hosseinzadeh, Mehdi
    Sinopoli, Bruno
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 1460 - 1465
  • [10] Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions
    Lancewicki, Tal
    Segal, Shahar
    Koren, Tomer
    Mansour, Yishay
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139