Bandits With Heavy Tail

被引:165
作者
Bubeck, Sebastien [1 ]
Cesa-Bianchi, Nicolo [2 ]
Lugosi, Gabor [3 ,4 ]
机构
[1] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08540 USA
[2] Univ Milan, Dipartimento Informat, I-20135 Milan, Italy
[3] ICREA, Dept Econ, Barcelona, Spain
[4] Univ Pompeu Fabra, Barcelona, Spain
关键词
Heavy-tailed distributions; regret bounds; robust estimators; stochastic multi-armed bandit; ALGORITHM; LOCATION;
D O I
10.1109/TIT.2013.2277869
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The stochastic multiarmed bandit problem is well understood when the reward distributions are sub-Gaussian. In this paper, we examine the bandit problem under the weaker assumption that the distributions have moments of order 1 + epsilon, for some epsilon is an element of (0, 1]. Surprisingly, moments of order 2 (i.e., finite variance) are sufficient to obtain regret bounds of the same order as under sub-Gaussian reward distributions. In order to achieve such regret, we define sampling strategies based on refined estimators of the mean such as the truncated empirical mean, Catoni's M-estimator, and the median-of-means estimator. We also derive matching lower bounds that also show that the best achievable regret deteriorates when epsilon < 1.
引用
收藏
页码:7711 / 7717
页数:7
相关论文
共 20 条
[2]   The space complexity of approximating the frequency moments [J].
Alon, N ;
Matias, Y ;
Szegedy, M .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1999, 58 (01) :137-147
[3]  
[Anonymous], 2012, REGRET ANAL STOCHAST
[4]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[5]   ON SOME ROBUST ESTIMATES OF LOCATION [J].
BICKEL, PJ .
ANNALS OF MATHEMATICAL STATISTICS, 1965, 36 (03) :847-858
[6]  
Bubeck S., 2010, THESIS U LILLE 1 LIL
[7]  
Catoni O., 2010, CHALLENGING EMPIRICA
[8]   An optimal algorithm for Monte Carlo estimation [J].
Dagum, P ;
Karp, R ;
Luby, M ;
Ross, S .
SIAM JOURNAL ON COMPUTING, 2000, 29 (05) :1484-1496
[9]   Adaptive sampling methods for scaling up knowledge discovery algorithms [J].
Domingo, C ;
Gavaldà, R ;
Watanabe, O .
DATA MINING AND KNOWLEDGE DISCOVERY, 2002, 6 (02) :131-152
[10]   Algorithm portfolio selection as a bandit problem with unbounded losses [J].
Gagliolo, Matteo ;
Schmidhuber, Jurgen .
ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2011, 61 (02) :49-86