共 36 条
- [1] [Anonymous], 2011, Neural Information Processing Systems
- [2] [Anonymous], 2008, 22 ANN C NEUR INF PR
- [3] Anthony T., 2019, POLICY GRADIENT SEAR
- [4] Finite-time analysis of the multiarmed bandit problem [J]. MACHINE LEARNING, 2002, 47 (2-3) : 235 - 256
- [5] Bansal M., 2018, arXiv preprint arXiv:1812.03079
- [6] Bishop C. M., 1994, Technical Report No. NCRG/94/004.
- [8] Bubeck S, 2011, J MACH LEARN RES, V12, P1655
- [9] Cai P, 2019, ARXIV190512197