共 14 条
[11]
Osband I., 2016, On lower bounds for regret in reinforcement learning
[12]
Tewari A., 2008, Advances in Neural Information Processing Systems, P1505
[13]
Tewari Ambuj, 2007, Reinforcement learning in large or unknown MDPs
[14]
Zanette A., 2019, ARXIV190100210