共 213 条
[11]
Allenberg C., Auer P., Gyorfi L., Ottucsak G., Hannan consistency in on-line learning in case of unbounded losses under partial monitoring, Proc. Of International Conference on Algorithmic Learning Theory (ALT), pp. 229-243, (2006)
[12]
Allesiardo R., Feraud R., Exp3 with drift detection for the switching bandit problem, Proc. Of IEEE International Conference on Data Science and Advanced Analytics (DSAA), (2015)
[13]
Alon N., Cesa-Bianchi N., Dekel O., Koren T., Online learning with feedback graphs: Beyond bandits, Proc. Of the 28th Conference on Learning Theory (COLT), 40, pp. 23-35, (2015)
[14]
Anandkumar A., Michael N., Tang A.K., Swami A., Distributed algorithms for learning and cognitive medium access with logarithmic regret, IEEE Journal on Selected Areas in Communications, 29, pp. 731-745, (2011)
[15]
Anantharam V., Varaiya P., Walrand J., Asymptotically efficient allocation rules for the multi-armed bandit problem with multiple plays-Part I: I. I. D. rewards, IEEE Transactions on Automatic Control, 32, pp. 968-975, (1987)
[16]
Anantharam V., Varaiya P., Walrand J., Asymptotically efficient allocation rules for the multi-armed bandit problem with multiple plays-Part II: Markovian rewards, IEEE Transaction on Automatic Control, 32, pp. 977-982, (1987)
[17]
Arrow K., Social Choice and Individual Values, (1951)
[18]
Asawa M., Teneketzis D., Multi-armed bandits with switching penalties, IEEE Transactions on Automatic Control, 41, pp. 328-348, (1996)
[19]
Audibert J., Bubeck S., Minimax policies for adversarial and stochastic bandits, Proc. Of the 22nd Annual Conference on Learning Theory (COLT)., (2009)
[20]
Audibert J., Bubeck S., Regret bounds and minimax policies under partial monitoring, Journal of Machine Learning Research, pp. 2785-2836, (2010)