Bandits With Heavy Tail

被引:151
作者
Bubeck, Sebastien [1 ]
Cesa-Bianchi, Nicolo [2 ]
Lugosi, Gabor [3 ,4 ]
机构
[1] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08540 USA
[2] Univ Milan, Dipartimento Informat, I-20135 Milan, Italy
[3] ICREA, Dept Econ, Barcelona, Spain
[4] Univ Pompeu Fabra, Barcelona, Spain
关键词
Heavy-tailed distributions; regret bounds; robust estimators; stochastic multi-armed bandit; ALGORITHM; LOCATION;
D O I
10.1109/TIT.2013.2277869
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The stochastic multiarmed bandit problem is well understood when the reward distributions are sub-Gaussian. In this paper, we examine the bandit problem under the weaker assumption that the distributions have moments of order 1 + epsilon, for some epsilon is an element of (0, 1]. Surprisingly, moments of order 2 (i.e., finite variance) are sufficient to obtain regret bounds of the same order as under sub-Gaussian reward distributions. In order to achieve such regret, we define sampling strategies based on refined estimators of the mean such as the truncated empirical mean, Catoni's M-estimator, and the median-of-means estimator. We also derive matching lower bounds that also show that the best achievable regret deteriorates when epsilon < 1.
引用
收藏
页码:7711 / 7717
页数:7
相关论文
共 20 条