共 43 条
- [1] Bubeck S, Cesa-Bianchi N., Regret analysis of stochastic and nonstochastic multi-armed bandit problems[J], Foundations and Trends in Machine Learning, 5, 1, pp. 1-122, (2012)
- [2] Woodroofe M., A one-armed bandit problem with a concomitant variable[J], Journal of the American Statistical Association, 74, 368, pp. 799-806, (1979)
- [3] Lihong Li, Wei Chu, Langford J, Et al., A contextual-bandit approach to personalized news article recommendation [C], Proc of the 19th Int Conf on World Wide Web (WWW), pp. 661-670, (2010)
- [4] Cont R, Bouchaud J., Herd behavior and aggregate fluctuations in financial markets [J], Macroeconomic dynamics, 4, 2, pp. 170-196, (2000)
- [5] Roberts James A, Tjeerd W, Et al., The heavy tail of the human brain[J], Current Opinion in Neurobiology, 31, pp. 164-172, (2015)
- [6] Naoki A, Philip M L., Associative reinforcement learning using linear probabilistic concepts [C], Proc of the 16th Int Conf on Machine Learning (ICML), pp. 3-11, (1999)
- [7] Dani V, Hayes T P, Kakade S M., Stochastic linear optimization under bandit feedback [C], Proc of the 21st Annual Conf on Learning Theory (COLT), pp. 355-366, (2008)
- [8] Abbasi-Yadkori Y, Pal D, Szepesvari C., Improved algorithms for linear stochastic bandits [C], Advances in Neural Information Processing Systems 24 (NIPS), pp. 2312-2320, (2011)
- [9] Abbasi-Yadkori Y, Pal D, Szepesvari C., Online-to-confidence-set conversions and application to sparse stochastic bandits [C], Proc of the 15th Int Conf on Artificial Intelligence and Statistics (AISTATS), pp. 1-9, (2012)
- [10] Agrawal S, Goyal N., Thompson sampling for contextual bandits with linear payoffs [C], Proc of the 30th Int Conf on Machine Learning (ICML), pp. 127-135, (2013)