Qualitative Multi-Armed Bandits: A Quantile-Based Approach

被引:0
|
作者
Szorenyi, Balazs [1 ,5 ,6 ]
Busa-Fekete, Robert [2 ]
Weng, Paul [3 ,4 ]
Huellermeier, Eyke [2 ]
机构
[1] INRIA Lille Nord Europe, SequeL Project, 40 Ave Halley, F-59650 Villeneuve Dascq, France
[2] Univ Paderborn, Dept Comp Sci, D-33098 Paderborn, Germany
[3] SYSU CMU Joint Inst Engn, Guangzhou 510006, Guangdong, Peoples R China
[4] SYSU CMU Shunde Int Joint Res Inst, Shunde 528300, Peoples R China
[5] MTA SZTE Res Grp Artificial Intelligence, H-6720 Szeged, Hungary
[6] Technion Israel Inst Technol, Dept Elect Engn, IL-32000 Haifa, Israel
关键词
BOUNDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We formalize and study the multi-armed bandit (MAB) problem in a generalized stochastic setting, in which rewards are not assumed to be numerical. Instead, rewards are measured on a qualitative scale that allows for comparison but invalidates arithmetic operations such as averaging. Correspondingly, instead of characterizing an arm in terms of the mean of the underlying distribution, we opt for using a quantile of that distribution as a representative value. We address the problem of quantile-based online learning both for the case of a finite (pure exploration) and infinite time horizon (cumulative regret minimization). For both cases, we propose suitable algorithms and analyze their properties. These properties are also illustrated by means of first experimental studies.
引用
收藏
页码:1660 / 1668
页数:9
相关论文
共 50 条
  • [31] Decentralized Exploration in Multi-Armed Bandits
    Feraud, Raphael
    Alami, Reda
    Laroche, Romain
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [32] Multi-armed bandits with episode context
    Rosin, Christopher D.
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2011, 61 (03) : 203 - 230
  • [33] Introduction to Multi-Armed Bandits Preface
    Slivkins, Aleksandrs
    FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2019, 12 (1-2): : 1 - 286
  • [34] Federated Multi-armed Bandits with Personalization
    Shi, Chengshuai
    Shen, Cong
    Yang, Jing
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [35] Unreliable Multi-Armed Bandits: A Novel Approach to Recommendation Systems
    Ravi, Aditya Narayan
    Poduval, Pranav
    Moharir, Sharayu
    2020 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), 2020,
  • [36] A Differentially Private Approach for Budgeted Combinatorial Multi-Armed Bandits
    Wang, Hengzhi
    Cui, Laizhong
    Wang, En
    Liu, Jiangchuan
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2025, 22 (01) : 424 - 439
  • [37] LEVY BANDITS: MULTI-ARMED BANDITS DRIVEN BY LEVY PROCESSES
    Kaspi, Haya
    Mandelbaum, Avi
    ANNALS OF APPLIED PROBABILITY, 1995, 5 (02): : 541 - 565
  • [38] Learned Scheduling of LDPC Decoders Based on Multi-armed Bandits
    Habib, Salman
    Beemer, Allison
    Kliewer, Jorg
    2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 2789 - 2794
  • [39] Successive Reduction of Arms in Multi-Armed Bandits
    Gupta, Neha
    Granmo, Ole-Christoffer
    Agrawala, Ashok
    RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XXVIII: INCORPORATING APPLICATIONS AND INNOVATIONS IN INTELLIGENT SYSTEMS XIX, 2011, : 181 - +
  • [40] Quantum greedy algorithms for multi-armed bandits
    Hiroshi Ohno
    Quantum Information Processing, 22