Multi-armed bandits with dependent arms

被引:1
|
作者
Singh, Rahul [1 ]
Liu, Fang [2 ]
Sun, Yin [3 ]
Shroff, Ness [2 ]
机构
[1] Indian Inst Sci, Dept ECE, Bengaluru, Karnataka, India
[2] Ohio State Univ, Dept ECE, Columbus, OH USA
[3] Auburn Univ, Dept ECE, Auburn, AL USA
基金
美国国家科学基金会;
关键词
Multi-armed bandits; Online learning; Sequential decision making; Clustered bandits; Structured bandits; OPTIMIZATION;
D O I
10.1007/s10994-023-06457-z
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study a variant of the multi-armed bandit problem (MABP) which we call as MABs with dependent arms. Multiple arms are grouped together to form a cluster, and the reward distributions of arms in the same cluster are known functions of an unknown parameter that is a characteristic of the cluster. Thus, pulling an arm i not only reveals information about its own reward distribution, but also about all arms belonging to the same cluster. This "correlation" among the arms complicates the exploration-exploitation trade-off that is encountered in the MABP because the observation dependencies allow us to test simultaneously multiple hypotheses regarding the optimality of an arm. We develop learning algorithms based on the principle of optimism in the face of uncertainty (Lattimore and Szepesvari in Bandit algorithms, Cambridge University Press, 2020), which know the clusters, and hence utilize these additional side observations appropriately while performing exploration-exploitation trade-off. We show that the regret of our algorithms grows as O(KlogT)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(K\log T)$$\end{document}, where K is the number of clusters. In contrast, for an algorithm such as the vanilla UCB that does not utilize these dependencies, the regret scales as O(MlogT)\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$O(M\log T)$$\end{document}, where M is the number of arms. When KMUCH LESS-THANM\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$K\ll M$$\end{document}, i.e. there is a lot of dependencies among arms, our proposed algorithm drastically reduces the dependence of regret on the number of arms.
引用
收藏
页码:45 / 71
页数:27
相关论文
共 50 条
  • [1] Multi-Armed Bandits With Correlated Arms
    Gupta, Samarth
    Chaudhari, Shreyas
    Joshi, Gauri
    Yagan, Osman
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2021, 67 (10) : 6711 - 6732
  • [2] Multi-Armed Bandits with Dependent Arms for Cooperative Spectrum Sharing
    Lopez-Martinez, Mario
    Alcaraz, Juan J.
    Badia, Leonardo
    Zorzi, Michele
    2015 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC), 2015, : 7677 - 7682
  • [3] Successive Reduction of Arms in Multi-Armed Bandits
    Gupta, Neha
    Granmo, Ole-Christoffer
    Agrawala, Ashok
    RESEARCH AND DEVELOPMENT IN INTELLIGENT SYSTEMS XXVIII: INCORPORATING APPLICATIONS AND INNOVATIONS IN INTELLIGENT SYSTEMS XIX, 2011, : 181 - +
  • [4] On Kernelized Multi-armed Bandits
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [5] Contextual Combinatorial Multi-armed Bandits with Volatile Arms and Submodular Reward
    Chen, Lixing
    Xu, Jie
    Lu, Zhuo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [6] Regional Multi-Armed Bandits
    Wang, Zhiyang
    Zhou, Ruida
    Shen, Cong
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [7] Multi-armed Bandits with Compensation
    Wang, Siwei
    Huang, Longbo
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018), 2018, 31
  • [8] Federated Multi-Armed Bandits
    Shi, Chengshuai
    Shen, Cong
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 9603 - 9611
  • [9] PAC Identification of Many Good Arms in Stochastic Multi-Armed Bandits
    Chaudhuri, Arghya Roy
    Kalyanakrishnan, Shivaram
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [10] Multi-armed Bandits with Probing
    Elumar, Eray Can
    Tekin, Cem
    Yagan, Osman
    2024 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY, ISIT 2024, 2024, : 2080 - 2085