MULTI-ARMED BANDITS WITH COVARIATES: THEORY AND APPLICATIONS

被引:2
|
作者
Kim, Dong Woo [1 ]
Lai, Tze Leung [2 ]
Xu, Huanzhong [3 ]
机构
[1] Microsoft Corp, Anal & Expt Team, Redmond, WA 98052 USA
[2] Stanford Univ, Dept Stat, Stanford, CA 94305 USA
[3] Stanford Univ, Inst Computat & Math Engn, Stanford, CA 94305 USA
基金
美国国家科学基金会;
关键词
Contextual multi-armed bandits; c-greedy randomization; personalized medicine; recommender system; reinforcement learning; INFORMATION; ALLOCATION; REGRESSION; CONVERGENCE; RATES;
D O I
10.5705/ss.202020.0454
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
"Multi-armed bandits" were introduced as a new direction in the thennascent field of sequential analysis, developed during World War II in response to the need for more efficient testing of anti-aircraft gunnery, and later as a concrete application of dynamic programming and optimal control of Markov decision processes. A comprehensive theory that unified both directions emerged in the 1980s, providing important insights and algorithms for diverse applications in many science, technology, engineering and mathematics fields. The turn of the millennium marked the onset of a "personalization revolution," from personalized medicine and online personalized advertising and recommender systems (e.g. Netflix's recommendations for movies and TV shows, Amazon's recommendations for products to purchase, and Microsoft's Matchbox recommender). This has required an extension of classical bandit theory to nonparametric contextual bandits, where "contextual" refers to the incorporation of personal information as covariates. Such theory is developed herein, together with illustrative applications, statistical models, and computational tools for its implementation.
引用
收藏
页码:2275 / 2287
页数:13
相关论文
共 50 条
  • [1] Multi-armed bandits: Theory and applications to online learning in networks
    Zhao Q.
    Zhao, Qing, 1600, Morgan and Claypool Publishers (12): : 1 - 165
  • [2] Finding structure in multi-armed bandits
    Schulz, Eric
    Franklin, Nicholas T.
    Gershman, Samuel J.
    COGNITIVE PSYCHOLOGY, 2020, 119
  • [3] On Optimal Foraging and Multi-armed Bandits
    Srivastava, Vaibhav
    Reverdy, Paul
    Leonard, Naomi E.
    2013 51ST ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2013, : 494 - 499
  • [4] Combination of Auction Theory and Multi-Armed Bandits: Model, Algorithm, and Application
    Gao, Guoju
    Huang, Sijie
    Huang, He
    Xiao, Mingjun
    Wu, Jie
    Sun, Yu-E
    Zhang, Sheng
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (11) : 6343 - 6357
  • [5] Multiplayer Modeling via Multi-Armed Bandits
    Gray, Robert C.
    Zhu, Jichen
    Ontanon, Santiago
    2021 IEEE CONFERENCE ON GAMES (COG), 2021, : 695 - 702
  • [6] Quantum Reinforcement Learning for Multi-Armed Bandits
    Liu, Yi-Pei
    Li, Kuo
    Cao, Xi
    Jia, Qing-Shan
    Wang, Xu
    2022 41ST CHINESE CONTROL CONFERENCE (CCC), 2022, : 5675 - 5680
  • [7] Multi-Armed Bandits with Metric Movement Costs
    Koren, Tomer
    Livni, Roi
    Mansour, Yishay
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30
  • [8] Multi-Armed Bandits and Quantum Channel Oracles
    Buchholz, Simon
    Kuebler, Jonas M.
    Schoelkopf, Bernhard
    QUANTUM, 2025, 9
  • [9] MABWISER: Parallelizable Contextual Multi-armed Bandits
    Strong, Emily
    Kleynhans, Bernard
    Kadioglu, Serdar
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2021, 30 (04)
  • [10] An empirical evaluation of active inference in multi-armed bandits
    Markovic, Dimitrije
    Stojic, Hrvoje
    Schwoebel, Sarah
    Kiebel, Stefan J.
    NEURAL NETWORKS, 2021, 144 : 229 - 246