MULTI-ARMED BANDITS WITH COVARIATES: THEORY AND APPLICATIONS

被引:2
|
作者
Kim, Dong Woo [1 ]
Lai, Tze Leung [2 ]
Xu, Huanzhong [3 ]
机构
[1] Microsoft Corp, Anal & Expt Team, Redmond, WA 98052 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[3] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
Contextual multi-armed bandits; c-greedy randomization; personalized medicine; recommender system; reinforcement learning; INFORMATION; ALLOCATION; REGRESSION; CONVERGENCE; RATES;
D O I
10.5705/ss.202020.0454
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
"Multi-armed bandits" were introduced as a new direction in the thennascent field of sequential analysis, developed during World War II in response to the need for more efficient testing of anti-aircraft gunnery, and later as a concrete application of dynamic programming and optimal control of Markov decision processes. A comprehensive theory that unified both directions emerged in the 1980s, providing important insights and algorithms for diverse applications in many science, technology, engineering and mathematics fields. The turn of the millennium marked the onset of a "personalization revolution," from personalized medicine and online personalized advertising and recommender systems (e.g. Netflix's recommendations for movies and TV shows, Amazon's recommendations for products to purchase, and Microsoft's Matchbox recommender). This has required an extension of classical bandit theory to nonparametric contextual bandits, where "contextual" refers to the incorporation of personal information as covariates. Such theory is developed herein, together with illustrative applications, statistical models, and computational tools for its implementation.
引用
收藏
页码:2275 / 2287
页数:13
相关论文
共 50 条
  • [31] Maximizing Airtime Efficiency for Reliable Broadcast Streams in WMNs with Multi-Armed Bandits
    Perin, Giovanni
    Nophut, David
    Badia, Leonardo
    Fitzek, Frank H. P.
    2020 11TH IEEE ANNUAL UBIQUITOUS COMPUTING, ELECTRONICS & MOBILE COMMUNICATION CONFERENCE (UEMCON), 2020, : 472 - 478
  • [32] Regression Oracles and Exploration Strategies for Short-Horizon Multi-Armed Bandits
    Gray, Robert C.
    Zhu, Jichen
    Ontanon, Santiago
    2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 312 - 319
  • [33] An Analysis of the Value of Information When Exploring Stochastic, Discrete Multi-Armed Bandits
    Sledge, Isaac J.
    Principe, Jose C.
    ENTROPY, 2018, 20 (03)
  • [34] Scheduling for Massive MIMO With Hybrid Precoding Using Contextual Multi-Armed Bandits
    Mauricio, Weskley V. F.
    Maciel, Tarcisio Ferreira
    Klein, Anja
    Marques Lima, Francisco Rafael
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2022, 71 (07) : 7397 - 7413
  • [35] Optimality of Myopic Policy for a Class of Monotone Affine Restless Multi-Armed Bandits
    Mansourifard, Parisa
    Javidi, Tara
    Krishnamachari, Bhaskar
    2012 IEEE 51ST ANNUAL CONFERENCE ON DECISION AND CONTROL (CDC), 2012, : 877 - 882
  • [36] Adaptive Algorithm for Multi-Armed Bandit Problem with High-Dimensional Covariates
    Qian, Wei
    Ing, Ching-Kang
    Liu, Ji
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (546) : 970 - 982
  • [37] BADGE: Prioritizing UI Events with Hierarchical Multi-Armed Bandits for Automated UI Testing
    Ran, Dezhi
    Wang, Hao
    Wang, Wenyu
    Xie, Tao
    2023 IEEE/ACM 45TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING, ICSE, 2023, : 894 - 905
  • [38] EXPLOITING SIMILARITY INFORMATION IN REINFORCEMENT LEARNING Similarity Models for Multi-Armed Bandits and MDPs
    Ortner, Ronald
    ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1: ARTIFICIAL INTELLIGENCE, 2010, : 203 - 210
  • [39] Multi-armed Bandits for Self-distributing Stateful Services across Networking Infrastructures
    Rappa, Frederico Meletti
    Rodrigues-Filho, Roberto
    Panisson, Alison R.
    Marcolino, Leandro Soriano
    Bittencourt, Luiz F.
    PROCEEDINGS OF 2024 IEEE/IFIP NETWORK OPERATIONS AND MANAGEMENT SYMPOSIUM, NOMS 2024, 2024,
  • [40] Interactive Multi-objective Reinforcement Learning in Multi-armed Bandits with Gaussian Process Utility Models
    Roijers, Diederik M.
    Zintgraf, Luisa M.
    Libin, Pieter
    Reymond, Mathieu
    Bargiacchi, Eugenio
    Nowe, Ann
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2020, PT III, 2021, 12459 : 463 - 478