Multi-Armed Bandits With Self-Information Rewards

被引:1
|
作者
Weinberger, Nir [1 ]
Yemini, Michal [2 ]
机构
[1] Technion Israel Inst Technol, Fac Elect & Comp Engn, IL-3200003 Haifa, Israel
[2] Bar Ilan Univ, Fac Engn, IL-5290002 Ramat Gan, Israel
基金
以色列科学基金会;
关键词
Multi-armed bandits; self-information rewards; entropy estimation; upper confidence bounds; support size estimation; RANDOM-VARIABLES; ENTROPY; BOUNDS; DIVERSITY;
D O I
10.1109/TIT.2023.3299460
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper introduces the informational multi-armed bandit (IMAB) model, in which at each round, a player chooses an arm, observes a symbol, and receives an unobserved reward in the form of the symbol's self-information. Thus, the expected reward of an arm is the Shannon entropy of the probability mass function of the source that generates its symbols. The player aims to maximize the expected total reward associated with the entropy values of the arms played. Under the assumption that the alphabet size is known, two UCB-based algorithms are proposed for the IMAB model which consider the biases of the plug-in entropy estimator. The first algorithm optimistically corrects the bias term in the entropy estimation. The second algorithm relies on data-dependent confidence intervals that adapt to sources with small entropy values. Performance guarantees are provided by upper bounding the expected regret of each of the algorithms. Furthermore, in the Bernoulli case, the asymptotic behavior of these algorithms is compared to the Lai-Robbins lower bound for the pseudo regret. Additionally, under the assumption that the exact alphabet size is unknown, and instead the player only knows a loose upper bound on it, a UCB-based algorithm is proposed, in which the player aims to reduce the regret caused by the unknown alphabet size in a finite time regime. Numerical results illustrating the expected regret of the algorithms presented in the paper are provided.
引用
收藏
页码:7160 / 7184
页数:25
相关论文
共 50 条
  • [1] The value of information in multi-armed bandits with exponentially distributed rewards
    Ryzhov, Ilya O.
    Powell, Warren B.
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON COMPUTATIONAL SCIENCE (ICCS), 2011, 4 : 1363 - 1372
  • [2] Trading off Rewards and Errors in Multi-Armed Bandits
    Erraqabi, Akram
    Lazaric, Alessandro
    Valko, Michal
    Brunskill, Emma
    Liu, Yun-En
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 709 - 717
  • [3] Parametrized Stochastic Multi-armed Bandits with Binary Rewards
    Jiang, Chong
    Srikant, R.
    2011 AMERICAN CONTROL CONFERENCE, 2011, : 119 - 124
  • [4] Multi-armed Bandits with Generalized Temporally-Partitioned Rewards
    van den Broek, Ronald C.
    Litjens, Rik
    Sagis, Tobias
    Verbeeke, Nina
    Gajane, Pratik
    ADVANCES IN INTELLIGENT DATA ANALYSIS XXII, PT I, IDA 2024, 2024, 14641 : 41 - 52
  • [5] Combinatorial Multi-Armed Bandits with Concave Rewards and Fairness Constraints
    Xu, Huanle
    Liu, Yang
    Lau, Wing Cheong
    Li, Rui
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 2554 - 2560
  • [6] Multi-player Multi-armed Bandits: Decentralized Learning with IID Rewards
    Kalathil, Dileep
    Nayyar, Naumaan
    Jain, Rahul
    2012 50TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2012, : 853 - 860
  • [7] Optimal Algorithms for Stochastic Multi-Armed Bandits with Heavy Tailed Rewards
    Lee, Kyungjae
    Yang, Hongjun
    Lim, Sungbin
    Oh, Songhwai
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 33, NEURIPS 2020, 2020, 33
  • [8] Maximizing and Satisficing in Multi-armed Bandits with Graph Information
    Thaker, Parth K.
    Malu, Mohit
    Rao, Nikhil
    Dasarathy, Gautam
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [9] On Kernelized Multi-armed Bandits
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [10] Multi-armed Bandits with Compensation
    Wang, Siwei
    Huang, Longbo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31