Qualitative Multi-Armed Bandits: A Quantile-Based Approach

被引:0
|
作者
Szorenyi, Balazs [1 ,5 ,6 ]
Busa-Fekete, Robert [2 ]
Weng, Paul [3 ,4 ]
Huellermeier, Eyke [2 ]
机构
[1] INRIA Lille Nord Europe, SequeL Project, 40 Ave Halley, F-59650 Villeneuve Dascq, France
[2] Univ Paderborn, Dept Comp Sci, D-33098 Paderborn, Germany
[3] SYSU CMU Joint Inst Engn, Guangzhou 510006, Guangdong, Peoples R China
[4] SYSU CMU Shunde Int Joint Res Inst, Shunde 528300, Peoples R China
[5] MTA SZTE Res Grp Artificial Intelligence, H-6720 Szeged, Hungary
[6] Technion Israel Inst Technol, Dept Elect Engn, IL-32000 Haifa, Israel
关键词
BOUNDS;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We formalize and study the multi-armed bandit (MAB) problem in a generalized stochastic setting, in which rewards are not assumed to be numerical. Instead, rewards are measured on a qualitative scale that allows for comparison but invalidates arithmetic operations such as averaging. Correspondingly, instead of characterizing an arm in terms of the mean of the underlying distribution, we opt for using a quantile of that distribution as a representative value. We address the problem of quantile-based online learning both for the case of a finite (pure exploration) and infinite time horizon (cumulative regret minimization). For both cases, we propose suitable algorithms and analyze their properties. These properties are also illustrated by means of first experimental studies.
引用
收藏
页码:1660 / 1668
页数:9
相关论文
共 50 条
  • [41] Online Multi-Armed Bandits with Adaptive Inference
    Dimakopoulou, Maria
    Ren, Zhimei
    Zhou, Zhengyuan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [42] Multi-Armed Bandits for Adaptive Constraint Propagation
    Balafrej, Amine
    Bessiere, Christian
    Paparrizou, Anastasia
    PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 290 - 296
  • [43] Multi-armed linear bandits with latent biases
    Kang, Qiyu
    Tay, Wee Peng
    She, Rui
    Wang, Sijie
    Liu, Xiaoqian
    Yang, Yuan-Rui
    Information Sciences, 2024, 660
  • [44] Algorithms for Differentially Private Multi-Armed Bandits
    Tossou, Aristide C. Y.
    Dimitrakakis, Christos
    THIRTIETH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2016, : 2087 - 2093
  • [45] Combinatorial Multi-armed Bandits for Resource Allocation
    Zuo, Jinhang
    Joe-Wong, Carlee
    2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
  • [46] TRANSFER LEARNING FOR CONTEXTUAL MULTI-ARMED BANDITS
    Cai, Changxiao
    Cai, T. Tony
    Li, Hongzhe
    ANNALS OF STATISTICS, 2024, 52 (01): : 207 - 232
  • [47] Quantum Reinforcement Learning for Multi-Armed Bandits
    Liu, Yi-Pei
    Li, Kuo
    Cao, Xi
    Jia, Qing-Shan
    Wang, Xu
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 5675 - 5680
  • [48] Multi-armed bandits in discrete and continuous time
    Kaspi, H
    Mandelbaum, A
    ANNALS OF APPLIED PROBABILITY, 1998, 8 (04): : 1270 - 1290
  • [49] Multi-armed Bandits with Metric Switching Costs
    Guha, Sudipto
    Munagala, Kamesh
    AUTOMATA, LANGUAGES AND PROGRAMMING, PT II, PROCEEDINGS, 2009, 5556 : 496 - +
  • [50] Multiplayer Modeling via Multi-Armed Bandits
    Gray, Robert C.
    Zhu, Jichen
    Ontanon, Santiago
    2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 695 - 702