Mechanisms with learning for stochastic multi-armed bandit problems

被引:0
作者
Shweta Jain
Satyanath Bhat
Ganesh Ghalme
Divya Padmanabhan
Y. Narahari
机构
[1] Indian Institute of Science,Department of Computer Science and Automation
来源
Indian Journal of Pure and Applied Mathematics | 2016年 / 47卷
关键词
Multi-armed Bandit; mechanism design; learning algorithms;
D O I
暂无
中图分类号
学科分类号
摘要
The multi-armed bandit (MAB) problem is a widely studied problem in machine learning literature in the context of online learning. In this article, our focus is on a specific class of problems namely stochastic MAB problems where the rewards are stochastic. In particular, we emphasize stochastic MAB problems with strategic agents. Dealing with strategic agents warrants the use of mechanism design principles in conjunction with online learning, and leads to non-trivial technical challenges. In this paper, we first provide three motivating problems arising from Internet advertising, crowdsourcing, and smart grids. Next, we provide an overview of stochastic MAB problems and key associated learning algorithms including upper confidence bound (UCB) based algorithms. We provide proofs of important results related to regret analysis of the above learning algorithms. Following this, we present mechanism design for stochastic MAB problems. With the classic example of sponsored search auctions as a backdrop, we bring out key insights in important issues such as regret lower bounds, exploration separated mechanisms, designing truthful mechanisms, UCB based mechanisms, and extension to multiple pull MAB problems. Finally we provide a bird’s eye view of recent results in the area and present a few issues that require immediate future attention.
引用
收藏
页码:229 / 272
页数:43
相关论文
共 20 条
  • [1] Agrawal S.(2012)Analysis of Thompson sampling for the multi-armed bandit problem 25th annual Conference on Learning Theory, COLT’12 23 1-39
  • [2] Goyal N.(2002)Finite-time analysis of the multiarmed bandit problem Journal of Machine Learning 47 235-256
  • [3] Auer P.(2012)Regret analysis of stochastic and nonstochastic multi-armed bandit problems Foundations and Trends in Machine Learning 5 1-122
  • [4] Cesa-Bianchi N.(2008)Foundations of mechanism design: A tutorial - Part 2: Advanced Concepts and Results Sadhana- Indian Academy Proceedings in Engineering Sciences 33 121-174
  • [5] Fischer P.(1985)Asymptotically efficient adaptive allocation rules Advances in Applied Mathematics 6 4-22
  • [6] Bubeck S.(1981)Optimal auction design Mathematics of Operations Research 6 58-73
  • [7] Cesa-Bianchi N.(2013)Dynamic pay-per-action mechanisms and applications to online advertising Operations Research 61 98-111
  • [8] Garg Y. N. D.(1952)Some aspects of the sequential design of experiments Bull. Amer. Math. Soc. 58 527-535
  • [9] Gujar S.(2012)Truthful multi-armed bandit mechanisms for multi-slot sponsored search auctions Current Science 103 1064-1077
  • [10] Lai T. L.(1933)On the likelihood that one unknown probability exceeds another in view of the evidence of two samples Biometrika 25 285-294