共 213 条
[21]
Audibert J., Munos R., Szepesvari C., Exploration-exploitation trade-off using variance estimates in multi-armed bandits, Theoretical Computer Science, pp. 1876-1902, (2009)
[22]
Audibert J., Bubeck S., Munos R., Best arm identification in multi-armed bandits, Proc. Of the 23rd Annual Conference on Learning Theory (COLT), (2010)
[23]
Auer P., Cesa-Bianchi N., Fischer P., Finite-time analysis of the multi-armed bandit problem, Machine Learning, 47, pp. 235-256, (2002)
[24]
Auer P., Cesa-Bianchi N., Freund Y., Schapire R.E., The nonstochastic multi-armed bandit problem, SIAM Journal on Computing, 32, pp. 48-77, (2003)
[25]
Auer P., Ortner R., Szepesvari C., Improved rates for the stochastic continuum-armed bandit problem, Proc. Of the 23rd Annual Conference on Learning Theory (COLT), pp. 454-468, (2007)
[26]
Aviv Y., Pazgal A., A partially observed Markov decision process for dynamic pricing, Management Science, 51, pp. 1400-1416, (2005)
[27]
Awerbuch B., Kleinberg R., Online linear optimization and adaptive routing, Journal of Computer and System Sciences, pp. 97-114, (2008)
[28]
Badanidiyuru A., Kleinberg R., Slivkins A., Bandits with knapsacks, IEEE 54th Annual Symposium on Foundations of Computer Science (FOCS), pp. 207-216, (2013)
[29]
Banks J., Sundaram R., Switching costs and the Gittins index, Econometrica, 62, pp. 687-694, (1994)
[30]
Bellman R., A problem in the sequential design of experiments, Sankhia, 16, pp. 221-229, (1956)