Optimal Regret Bounds for Collaborative Learning in Bandits

被引:0
|
作者
Shidani, Amitis [1 ]
Vakili, Sattar [2 ]
机构
[1] Univ Oxford, Oxford, England
[2] MediaTek Res, Cambridge, England
来源
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237 | 2024年 / 237卷
关键词
Multi-agent multi-armed bandit; Collaborative learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider regret minimization in a general collaborative multi-agent multi-armed bandit model, in which each agent faces a finite set of arms and may communicate with other agents through a central controller. The optimal arm for each agent in this model is the arm with the largest expected mixed reward, where the mixed reward of each arm is a weighted average of its rewards across all agents, making communication among agents crucial. While near-optimal sample complexities for best arm identification are known under this collaborative model, the question of optimal regret remains open. In this work, we address this problem and propose the first algorithm with order optimal regret bounds under this collaborative bandit model. Furthermore, we show that only a small constant number of expected communication rounds is needed.
引用
收藏
页数:17
相关论文
共 50 条
  • [21] Near-Optimal Regret Bounds for Contextual Combinatorial Semi-Bandits with Linear Payoff Functions
    Takemura, Kei
    Ito, Shinji
    Hatano, Daisuke
    Sumita, Hanna
    Fukunaga, Takuro
    Kakimura, Naonori
    Kawarabayashi, Ken-ichi
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9791 - 9798
  • [22] Improved Regret Bounds of (Multinomial) Logistic Bandits via Regret-to-Confidence-Set Conversion
    Lee, Junghyun
    Yun, Se-Young
    Jun, Kwang-Sung
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [23] Optimal Order Simple Regret for Gaussian Process Bandits
    Vakili, Sattar
    Bouziani, Nacime
    Jalali, Sepehr
    Bernacchia, Alberto
    Shiu, Da-shan
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [24] Regret Bounds for Lifelong Learning
    Alquier, Pierre
    The Tien Mai
    Pontil, Massimiliano
    ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 54, 2017, 54 : 261 - 269
  • [25] Improved Regret Bounds of Bilinear Bandits using Action Space Analysis
    Jang, Kyoungseok
    Jun, Kwang-Sung
    Yun, Se-Young
    Kang, Wanmo
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [26] CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption
    Agrawal, Shubhada
    Mathieu, Timothee
    Basu, Debabrota
    Maillard, Odalric-Ambrym
    INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237, 2024, 237
  • [27] Tight Regret Bounds for Infinite-armed Linear Contextual Bandits
    Li, Yingkai
    Wang, Yining
    Chen, Xi
    Zhou, Yuan
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130 : 370 - 378
  • [28] Information-Theoretic Regret Bounds for Bandits with Fixed Expert Advice
    Eldowa, Khaled
    Cesa-Bianchi, Nicolo
    Metelli, Alberto Maria
    Restelli, Marcello
    2023 IEEE INFORMATION THEORY WORKSHOP, ITW, 2023, : 30 - 35
  • [29] Improved Regret Bounds for Oracle-Based Adversarial Contextual Bandits
    Syrgkanis, Vasilis
    Luo, Haipeng
    Krishnamurthy, Akshay
    Schapire, Robert E.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [30] No Weighted-Regret Learning in Adversarial Bandits with Delays
    Bistritz, Ilai
    Zhou, Zhengyuan
    Chen, Xi
    Bambos, Nicholas
    Blanchet, Jose
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23