共 34 条
[1]
Rusu AA, 2016, Arxiv, DOI [arXiv:1606.04671, 10.48550/arXiv.1606.04671, DOI 10.43550/ARXIV:1606.04671, DOI 10.48550/ARXIV.1606.04671]
[2]
Abbasi- Yadkori Yasin, 2011, P ADV NEUR INF PROC, V11, P2312
[3]
Linear Thompson sampling revisited
[J].
ELECTRONIC JOURNAL OF STATISTICS,
2017, 11 (02)
:5165-5197
[4]
Agrawal S., 2013, P 30 INT C INT C MAC, V28
[5]
Antos A, 2008, LECT NOTES ARTIF INT, V5254, P287, DOI 10.1007/978-3-540-87987-9_25
[6]
Brockman G, 2016, Arxiv, DOI arXiv:1606.01540
[7]
Upper-Confidence-Bound Algorithms for Active Learning in Multi-armed Bandits
[J].
ALGORITHMIC LEARNING THEORY,
2011, 6925
:189-+
[8]
Dhariwal Prafulla, 2017, Openai baselines
[9]
Hiraoka Takuya, 2019, Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8-14, 2019, P2615
[10]
Kurutach Thanard, 2018, ICLR