Multi-armed Bandit with Additional Observations

被引:0
作者
Yun D. [1 ]
Ahn S. [2 ]
Proutiere A. [3 ]
Shin J. [2 ]
Yi Y. [2 ]
机构
[1] Naver Corporation, Seongnam-si
来源
Performance Evaluation Review | 2018年 / 46卷 / 01期
基金
欧洲研究理事会;
关键词
additional observations; INF; KL-UCB; multi-armed bandit;
D O I
10.1145/3219617.3219639
中图分类号
学科分类号
摘要
We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. We propose algorithms that are asymptotic-optimal and order-optimal in their regrets under the settings of stochastic and adversarial rewards, respectively. © 2018 ACM.
引用
收藏
页码:53 / 55
页数:2
相关论文
共 18 条
[1]  
Amin K., Kale S., Tesauro Deepak Turaga G., Budgeted prediction with expert advice, Proceedings of AAAI, (2015)
[2]  
Audibert J., Bubeck S., Regret bounds and minimax policies under partial monitoring, The Journal of Machine Learning Research, 11, pp. 2785-2836, (2010)
[3]  
Auer P., Cesa-Bianchi N., Freund Y., Schapire R.E., The nonstochastic multiarmed bandit problem, SIAM J. Comput., 32, 1, pp. 48-77, (2002)
[4]  
Caron S., Kveton B., Lelarge M., Bhagat S., Leveraging side observations in stochastic bandits, Proceedings of UAI, (2012)
[5]  
Cesa-Bianchi N., Freund Y., Haussler D., Helmbold D.P., Schapire R.E., Warmuth M.K., How to use expert advice, Journal of the ACM (JACM), 44, 3, pp. 427-485, (1997)
[6]  
Cesa-Bianchi N., Lugosi G., Prediction, Learning, and Games, (2006)
[7]  
Combes R., Jiang C., Srikant R., Bandits with budgets: Regret lower bounds and optimal algorithms, Proceedings of ACM SIGMETRICS, (2015)
[8]  
Combes R., Proutiere A., Yun D., Ok J., Yi Y., Optimal rate sampling in 802. 11 systems, Proceedings of IEEE INFOCOM, (2014)
[9]  
Freund Y., Schapire R.E., A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55, 1, pp. 119-139, (1997)
[10]  
Garivier A., Cappe O., The KL-UCB algorithm for bounded stochastic bandits and beyond, Proceedings of COLT, (2011)