Optimal Regret Bounds for Collaborative Learning in Bandits

被引:0
|
作者
Shidani, Amitis [1 ]
Vakili, Sattar [2 ]
机构
[1] Univ Oxford, Oxford, England
[2] MediaTek Res, Cambridge, England
来源
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237 | 2024年 / 237卷
关键词
Multi-agent multi-armed bandit; Collaborative learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider regret minimization in a general collaborative multi-agent multi-armed bandit model, in which each agent faces a finite set of arms and may communicate with other agents through a central controller. The optimal arm for each agent in this model is the arm with the largest expected mixed reward, where the mixed reward of each arm is a weighted average of its rewards across all agents, making communication among agents crucial. While near-optimal sample complexities for best arm identification are known under this collaborative model, the question of optimal regret remains open. In this work, we address this problem and propose the first algorithm with order optimal regret bounds under this collaborative bandit model. Furthermore, we show that only a small constant number of expected communication rounds is needed.
引用
收藏
页数:17
相关论文
共 50 条
  • [31] No Weighted-Regret Learning in Adversarial Bandits with Delays
    Bistritz, Ilai
    Zhou, Zhengyuan
    Cheny, Xi
    Bambos, Nicholas
    Blanchet, Jose
    Journal of Machine Learning Research, 2022, 23
  • [32] Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits
    Li, Yingkai
    Wang, Yining
    Zhou, Yuan
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (01) : 372 - 388
  • [33] Nearly Minimax-Optimal Regret for Linearly Parameterized Bandits
    Li, Yingkai
    Wang, Yining
    Zhou, Yuan
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [34] Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning
    Zhang, Zihan
    Jiang, Yuhang
    Zhou, Yuan
    Ji, Xiangyang
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [35] Collaborative Learning with Limited Interaction: Tight Bounds for Distributed Exploration in Multi-Armed Bandits
    Tao, Chao
    Zhang, Qin
    Zhou, Yuan
    2019 IEEE 60TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2019), 2019, : 126 - 146
  • [36] Minimax Regret Bounds for Reinforcement Learning
    Azar, Mohammad Gheshlaghi
    Osband, Ian
    Munos, Remi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [37] Variational Regret Bounds for Reinforcement Learning
    Ortner, Ronald
    Gajane, Pratik
    Auer, Peter
    35TH UNCERTAINTY IN ARTIFICIAL INTELLIGENCE CONFERENCE (UAI 2019), 2020, 115 : 81 - 90
  • [38] Regret of Queueing Bandits
    Krishnasamy, Subhashini
    Sen, Rajat
    Johari, Ramesh
    Shakkottai, Sanjay
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
  • [39] Optimal Learning for Structured Bandits
    Van Parys, Bart
    Golrezaei, Negin
    MANAGEMENT SCIENCE, 2023, 70 (06) : 3951 - 3998
  • [40] Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits
    Karpov, Nikolai
    Zhang, Qin
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13076 - 13084