共 213 条
[1]
Abbasi-Yadkori Y., Pal D., Szepesvari C., Improved algorithms for linear stochastic bandits, Proc. Of Conference on Neural Information Processing Systems (NeurIPS), pp. 2312-2320, (2011)
[2]
Agarwal A., Foster D.P., Hsu D., Kakade S.M., Rakhlin A., Stochastic convex optimization with bandit feedback, SIAM Journal Optimization, 23, pp. 213-240, (2013)
[3]
Aghion P., Bolton C.H.P., Jullien B., Optimal learning by experimentation, The Review of Economic Studies, 58, pp. 621-654, (1991)
[4]
Agrawal R., Sample mean based index policies by O. log n/ regret for the multi-armed bandit problem, Advances in Applied Probability, 27, pp. 1054-1078, (1995)
[5]
Agrawal R., The continuum-armed bandit problem, SIAM Journal on Control and Optimization, 33, pp. 1926-1951, (1995)
[6]
Agrawal S., Goyal N., Analysis of Thompson sampling for the multi-armed bandit problem, Proc. Of Conference on Learning Theory (COLT), (2012)
[7]
Agrawal S., Goyal N., Further optimal regret bounds for Thompson sampling, Proc. Of the 16th International Conference on Artificial Intelligence and Statistics (AISTATS), (2013)
[8]
Agrawal R., Teneketzis D., Certainty equivalence control with forcing: Revisited, Systems and Control Letters, 13, 5, pp. 405-412, (1989)
[9]
Ahmad S.H., Liu M., Javadi T., Zhao Q., Krishnamachari B., Optimality of myopic sensing in multi-channel opportunistic access, IEEE Transactions on Information Theory, 55, pp. 4040-4050, (2009)
[10]
Albert A.E., The sequential design of experiments for infinitely many states of nature, The Annals of Mathematics Statistics, 32, pp. 774-799, (1961)