Multi-armed Bandit with Additional Observations

被引：0

作者：

Yun D. ^{[1
]}

Ahn S. ^{[2
]}

Proutiere A. ^{[3
]}

Shin J. ^{[2
]}

Yi Y. ^{[2
]}

机构：

[1] Naver Corporation, Seongnam-si

来源：

Performance Evaluation Review | 2018年 / 46卷 / 01期

基金：

欧洲研究理事会;

关键词：

additional observations; INF; KL-UCB; multi-armed bandit;

D O I：

10.1145/3219617.3219639

中图分类号：

学科分类号：

摘要：

We study multi-armed bandit (MAB) problems with additional observations, where in each round, the decision maker selects an arm to play and can also observe rewards of additional arms (within a given budget) by paying certain costs. We propose algorithms that are asymptotic-optimal and order-optimal in their regrets under the settings of stochastic and adversarial rewards, respectively. © 2018 ACM.

引用

页码：53 / 55

页数：2

共 18 条

[1]

Amin K., Kale S., Tesauro Deepak Turaga G., Budgeted prediction with expert advice, Proceedings of AAAI, (2015)

[2]

Audibert J., Bubeck S., Regret bounds and minimax policies under partial monitoring, The Journal of Machine Learning Research, 11, pp. 2785-2836, (2010)

[3]

Auer P., Cesa-Bianchi N., Freund Y., Schapire R.E., The nonstochastic multiarmed bandit problem, SIAM J. Comput., 32, 1, pp. 48-77, (2002)

[4]

Caron S., Kveton B., Lelarge M., Bhagat S., Leveraging side observations in stochastic bandits, Proceedings of UAI, (2012)

[5]

Cesa-Bianchi N., Freund Y., Haussler D., Helmbold D.P., Schapire R.E., Warmuth M.K., How to use expert advice, Journal of the ACM (JACM), 44, 3, pp. 427-485, (1997)

[6]

Cesa-Bianchi N., Lugosi G., Prediction, Learning, and Games, (2006)

[7]

Combes R., Jiang C., Srikant R., Bandits with budgets: Regret lower bounds and optimal algorithms, Proceedings of ACM SIGMETRICS, (2015)

[8]

Combes R., Proutiere A., Yun D., Ok J., Yi Y., Optimal rate sampling in 802. 11 systems, Proceedings of IEEE INFOCOM, (2014)

[9]

Freund Y., Schapire R.E., A decision-theoretic generalization of on-line learning and an application to boosting, Journal of Computer and System Sciences, 55, 1, pp. 119-139, (1997)

[10]

Garivier A., Cappe O., The KL-UCB algorithm for bounded stochastic bandits and beyond, Proceedings of COLT, (2011)

← 1 2 →