MULTI-ARMED BANDITS WITH COVARIATES: THEORY AND APPLICATIONS

被引:2
|
作者
Kim, Dong Woo [1 ]
Lai, Tze Leung [2 ]
Xu, Huanzhong [3 ]
机构
[1] Microsoft Corp, Anal & Expt Team, Redmond, WA 98052 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[3] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
Contextual multi-armed bandits; c-greedy randomization; personalized medicine; recommender system; reinforcement learning; INFORMATION; ALLOCATION; REGRESSION; CONVERGENCE; RATES;
D O I
10.5705/ss.202020.0454
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
"Multi-armed bandits" were introduced as a new direction in the thennascent field of sequential analysis, developed during World War II in response to the need for more efficient testing of anti-aircraft gunnery, and later as a concrete application of dynamic programming and optimal control of Markov decision processes. A comprehensive theory that unified both directions emerged in the 1980s, providing important insights and algorithms for diverse applications in many science, technology, engineering and mathematics fields. The turn of the millennium marked the onset of a "personalization revolution," from personalized medicine and online personalized advertising and recommender systems (e.g. Netflix's recommendations for movies and TV shows, Amazon's recommendations for products to purchase, and Microsoft's Matchbox recommender). This has required an extension of classical bandit theory to nonparametric contextual bandits, where "contextual" refers to the incorporation of personal information as covariates. Such theory is developed herein, together with illustrative applications, statistical models, and computational tools for its implementation.
引用
收藏
页码:2275 / 2287
页数:13
相关论文
共 50 条
  • [41] Contextual Multi-armed Bandits for Data Caching in Intermittently-Connected Lossy Maritime Networks
    Forero, Pedro A.
    Wakayama, Cherry Y.
    OCEANS 2023 - LIMERICK, 2023,
  • [42] KL-UCB-Based Policy for Budgeted Multi-Armed Bandits with Stochastic Action Costs
    Watanabe, Ryo
    Komiyama, Junpei
    Nakamura, Atsuyoshi
    Kudo, Mineichi
    IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2017, E100A (11): : 2470 - 2486
  • [43] Exploring the Benefit of Customizing Feedback Interventions For Educators and Students With Offline Contextual Multi-Armed Bandits
    Yun, Joy
    Nie, Allen
    Brunskill, Emma
    Demszky, Dorottya
    FIFTEENTH INTERNATIONAL CONFERENCE ON LEARNING ANALYTICS & KNOWLEDGE, LAK 2025, 2025, : 944 - 949
  • [44] Multi-armed bandits for adjudicating documents in pooling-based evaluation of information retrieval systems
    Losada, David E.
    Parapar, Javier
    Barreiro, Alvaro
    INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (05) : 1005 - 1025
  • [45] Multi-armed quantum bandits: Exploration versus exploitation when learning properties of quantum states
    Lumbreras, Josep
    Haapasalo, Erkka
    Tomamichel, Marco
    QUANTUM, 2022, 6
  • [46] No DBA? No Regret! Multi-Armed Bandits for Index Tuning of Analytical and HTAP Workloads With Provable Guarantees
    Perera, R. Malinga
    Oetomo, Bastian
    Rubinstein, Benjamin I. P.
    Borovica-Gajic, Renata
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2023, 35 (12) : 12855 - 12872
  • [47] Multi-Armed Bandits and Reinforcement Learning: Advancing Decision Making in E-Commerce and Beyond
    Jiang, Daniel
    Luo, Haipeng
    Wang, Chu
    Wang, Yingfei
    KDD '21: PROCEEDINGS OF THE 27TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2021, : 4133 - 4134
  • [48] Multi-armed bandits for bid shading in first-price real-time bidding auctions
    Tilli, Tuomo
    Espinosa-Leal, Leonardo
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 41 (06) : 6111 - 6125
  • [50] Robust control of the multi-armed bandit problem
    Caro, Felipe
    Das Gupta, Aparupa
    ANNALS OF OPERATIONS RESEARCH, 2022, 317 (02) : 461 - 480