Optimal Regret Bounds for Collaborative Learning in Bandits

被引:0
|
作者
Shidani, Amitis [1 ]
Vakili, Sattar [2 ]
机构
[1] Univ Oxford, Oxford, England
[2] MediaTek Res, Cambridge, England
来源
INTERNATIONAL CONFERENCE ON ALGORITHMIC LEARNING THEORY, VOL 237 | 2024年 / 237卷
关键词
Multi-agent multi-armed bandit; Collaborative learning;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider regret minimization in a general collaborative multi-agent multi-armed bandit model, in which each agent faces a finite set of arms and may communicate with other agents through a central controller. The optimal arm for each agent in this model is the arm with the largest expected mixed reward, where the mixed reward of each arm is a weighted average of its rewards across all agents, making communication among agents crucial. While near-optimal sample complexities for best arm identification are known under this collaborative model, the question of optimal regret remains open. In this work, we address this problem and propose the first algorithm with order optimal regret bounds under this collaborative bandit model. Furthermore, we show that only a small constant number of expected communication rounds is needed.
引用
收藏
页数:17
相关论文
共 50 条
  • [1] Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds
    Mitra, Aritra
    Adibi, Arman
    Pappas, George J.
    Hassani, Hamed
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35 (NEURIPS 2022), 2022,
  • [2] Unimodal Bandits: Regret Lower Bounds and Optimal Algorithms
    Combes, Richard
    Proutiere, Alexandre
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 32 (CYCLE 1), 2014, 32
  • [3] Regret Bounds for Batched Bandits
    Esfandiari, Hossein
    Karbasi, Amin
    Mehrabian, Abbas
    Mirrokni, Vahab
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 7340 - 7348
  • [4] Regret bounds for restless Markov bandits
    Ortner, Ronald
    Ryabko, Daniil
    Auer, Peter
    Munos, Remi
    THEORETICAL COMPUTER SCIENCE, 2014, 558 : 62 - 76
  • [5] Regret bounds for sleeping experts and bandits
    Kleinberg, Robert
    Niculescu-Mizil, Alexandru
    Sharma, Yogeshwer
    MACHINE LEARNING, 2010, 80 (2-3) : 245 - 272
  • [6] Regret bounds for sleeping experts and bandits
    Robert Kleinberg
    Alexandru Niculescu-Mizil
    Yogeshwer Sharma
    Machine Learning, 2010, 80 : 245 - 272
  • [7] Near-optimal Per-Action Regret Bounds for Sleeping Bandits
    Quan Nguyen
    Mehta, Nishant A.
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 238, 2024, 238
  • [8] On Regret-Optimal Learning in Decentralized Multiplayer Multiarmed Bandits
    Nayyar, Naumaan
    Kalathil, Dileep
    Jain, Rahul
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2018, 5 (01): : 597 - 606
  • [9] Near-Optimal Collaborative Learning in Bandits
    Reda, Clemence
    Vakili, Sattar
    Kaufmann, Emilie
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [10] Complete Policy Regret Bounds for Tallying Bandits
    Malik, Dhruv
    Li, Yuanzhi
    Singh, Aarti
    CONFERENCE ON LEARNING THEORY, VOL 178, 2022, 178