Optimal Learning for Structured Bandits

被引:0
|
作者
Van Parys, Bart [1 ]
Golrezaei, Negin [1 ]
机构
[1] MIT, Sloan Sch Management, Cambridge, MA 02142 USA
关键词
structured bandits; online learning; convex duality; mimicking regret lower bound;
D O I
10.1287/mnsc.2020.02108
中图分类号
C93 [管理学];
学科分类号
12 ; 1201 ; 1202 ; 120202 ;
摘要
We study structured multiarmed bandits, which is the problem of online decision-making under uncertainty in the presence of structural information. In this problem, the decision-maker needs to discover the best course of action despite observing only uncertain rewards over time. The decision-maker is aware of certain convex structural information regarding the reward distributions; that is, the decision-maker knows that the reward distributions of the arms belong to a convex compact set. In the presence of such structural information, the decision-maker then would like to minimize his or her regret by exploiting this information, where the regret is its performance difference against a benchmark policy that knows the best action ahead of time. In the absence of structural information, the classical upper confidence bound (UCB) and Thomson sampling algorithms are well known to suffer minimal regret. However, as recently pointed out by Russo and Van Roy (2018) and Lattimore and Szepesvari (2017), neither algorithm is capable of exploiting structural information that is commonly available in practice. We propose a novel learning algorithm that we call "DUSA," whose regret matches the information-theoretic regret lower bound up to a constant factor and can handle a wide range of structural information. Our algorithm DUSA solves a dual counterpart of the regret lower bound at the empirical reward distribution and follows its suggested play. We show that this idea leads to the first computationally viable learning policy with asymptotic minimal regret for various structural information, including well-known structured bandits such as linear, Lipschitz, and convex bandits and novel structured bandits that have not been studied in the literature because of the lack of a unified and flexible framework.
引用
收藏
页码:3951 / 3998
页数:49
相关论文
共 50 条
  • [21] Global Bandits
    Atan, Onur
    Tekin, Cem
    van der Schaar, Mihaela
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2018, 29 (12) : 5798 - 5811
  • [22] Multi-armed bandits with dependent arms
    Singh, Rahul
    Liu, Fang
    Sun, Yin
    Shroff, Ness
    MACHINE LEARNING, 2024, 113 (01) : 45 - 71
  • [23] Ranked Bandits in Metric Spaces: Learning Diverse Rankings over Large Document Collections
    Slivkins, Aleksandrs
    Radlinski, Filip
    Gollapudi, Sreenivas
    JOURNAL OF MACHINE LEARNING RESEARCH, 2013, 14 : 399 - 436
  • [24] Multi-player Multi-armed Bandits: Decentralized Learning with IID Rewards
    Kalathil, Dileep
    Nayyar, Naumaan
    Jain, Rahul
    2012 50TH ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2012, : 853 - 860
  • [25] Collaborative Filtering Bandits
    Li, Shuai
    Karatzoglou, Alexandros
    Gentile, Claudio
    SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, : 539 - 548
  • [26] Linear Jamming Bandits: Sample-Efficient Learning for Non-Coherent Digital Jamming
    Thornton, Charles E.
    Buehrer, R. Michael
    2022 IEEE MILITARY COMMUNICATIONS CONFERENCE (MILCOM), 2022,
  • [27] Bandits and Experts in Metric Spaces
    Kleinberg, Robert
    Slivkins, Aleksandrs
    Upfal, Eli
    JOURNAL OF THE ACM, 2019, 66 (04)
  • [28] Non-Compliant Bandits
    Kveton, Branislav
    Liu, Yi
    Kruijssen, Johan Matteo
    Nie, Yisu
    PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023, 2023, : 1138 - 1147
  • [29] Malaysian nursing students' satisfaction with structured online learning
    Romli, Muhammad Hibatullah
    Wan Yunus, Farahiyah
    Foong, Chan Choong
    Soh, Kim Lam
    INNOVATIONS IN EDUCATION AND TEACHING INTERNATIONAL, 2024, 61 (03) : 487 - 502
  • [30] Validating an Observation Protocol for Structured Roles in Cooperative Learning
    Fong, Morgan M.
    Butler, Liia
    Chen, Hongxuan
    Herman, Geoffrey L.
    2022 IEEE FRONTIERS IN EDUCATION CONFERENCE, FIE, 2022,