Bandits With Heavy Tail

被引：165

作者：

Bubeck, Sebastien ^{[1
]}

Cesa-Bianchi, Nicolo ^{[2
]}

Lugosi, Gabor ^{[3
,4
]}

机构：

[1] Princeton Univ, Dept Operat Res & Financial Engn, Princeton, NJ 08540 USA

[2] Univ Milan, Dipartimento Informat, I-20135 Milan, Italy

[3] ICREA, Dept Econ, Barcelona, Spain

[4] Univ Pompeu Fabra, Barcelona, Spain

来源：

IEEE TRANSACTIONS ON INFORMATION THEORY | 2013年 / 59卷 / 11期

关键词：

Heavy-tailed distributions; regret bounds; robust estimators; stochastic multi-armed bandit; ALGORITHM; LOCATION;

D O I：

10.1109/TIT.2013.2277869

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The stochastic multiarmed bandit problem is well understood when the reward distributions are sub-Gaussian. In this paper, we examine the bandit problem under the weaker assumption that the distributions have moments of order 1 + epsilon, for some epsilon is an element of (0, 1]. Surprisingly, moments of order 2 (i.e., finite variance) are sufficient to obtain regret bounds of the same order as under sub-Gaussian reward distributions. In order to achieve such regret, we define sampling strategies based on refined estimators of the mean such as the truncated empirical mean, Catoni's M-estimator, and the median-of-means estimator. We also derive matching lower bounds that also show that the best achievable regret deteriorates when epsilon < 1.

引用

页码：7711 / 7717

页数：7

共 20 条

[1] SAMPLE-MEAN BASED INDEX POLICIES WITH O(LOG-N) REGRET FOR THE MULTIARMED BANDIT PROBLEM [J].

AGRAWAL, R .

ADVANCES IN APPLIED PROBABILITY, 1995, 27 (04) :1054-1078

[2] The space complexity of approximating the frequency moments [J].

Alon, N ;

Matias, Y ;

Szegedy, M .

JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 1999, 58 (01) :137-147

[3]

[Anonymous], 2012, REGRET ANAL STOCHAST

[4] Finite-time analysis of the multiarmed bandit problem [J].

Auer, P ;

Cesa-Bianchi, N ;

Fischer, P .

MACHINE LEARNING, 2002, 47 (2-3) :235-256

[5] ON SOME ROBUST ESTIMATES OF LOCATION [J].

BICKEL, PJ .

ANNALS OF MATHEMATICAL STATISTICS, 1965, 36 (03) :847-858

[6]

Bubeck S., 2010, THESIS U LILLE 1 LIL

[7]

Catoni O., 2010, CHALLENGING EMPIRICA

[8] An optimal algorithm for Monte Carlo estimation [J].

Dagum, P ;

Karp, R ;

Luby, M ;

Ross, S .

SIAM JOURNAL ON COMPUTING, 2000, 29 (05) :1484-1496

[9] Adaptive sampling methods for scaling up knowledge discovery algorithms [J].

Domingo, C ;

Gavaldà, R ;

Watanabe, O .

DATA MINING AND KNOWLEDGE DISCOVERY, 2002, 6 (02) :131-152

[10] Algorithm portfolio selection as a bandit problem with unbounded losses [J].

Gagliolo, Matteo ;

Schmidhuber, Jurgen .

ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2011, 61 (02) :49-86

← 1 2 →