Near-Optimal Regret Bounds for Thompson Sampling

被引：59

作者：

Agrawal, Shipra ^{[1
]}

Goyal, Navin ^{[1
,2
]}

机构：

[1] Microsoft Res, 9 Lavelle Rd, Bengaluru 560001, Karnataka, India

[2] Columbia Univ, Dept Ind Engn & Operat Res, 500 West 120th St,Mudd 423, New York, NY 10027 USA

来源：

JOURNAL OF THE ACM | 2017年 / 64卷 / 05期

关键词：

Multi-armed bandits; PRINCIPLE;

D O I：

10.1145/3088510

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

Thompson Sampling (TS) is one of the oldest heuristics for multiarmed bandit problems. It is a randomized algorithm based on Bayesian ideas and has recently generated significant interest after several studies demonstrated that it has favorable empirical performance compared to the state-of-the-art methods. In this article, a novel and almost tight martingale-based regret analysis for Thompson Sampling is presented. Our technique simultaneously yields both problem-dependent and problem-independent bounds: (1) the first near-optimal problem-independent bound of O(root NT ln T) on the expected regret and (2) the optimal problem-dependent bound of (1 + epsilon) Sigma i ln T/d(mu(i), mu(1)) + O(N/epsilon(2)) on the expected regret (this bound was first proven by Kaufmann et al. (2012b)). Our technique is conceptually simple and easily extends to distributions other than the Beta distribution used in the original TS algorithm. For the version of TS that uses Gaussian priors, we prove a problem-independent bound of O(root NT ln N) on the expected regret and show the optimality of this bound by providing a matching lower bound. This is the first lower bound on the performance of a natural version of Thompson Sampling that is away from the general lower bound of Omega(root NT) for the multiarmed bandit problem.

引用

页数：24

共 37 条

[31] Measurement uncertainty relations: characterising optimal error bounds for qubits
Bullock, T.
Busch, P.
JOURNAL OF PHYSICS A-MATHEMATICAL AND THEORETICAL, 2018, 51 (28)
[32] Model reduction algorithms for optimal control and importance sampling of diffusions
Hartmann, Carsten
Schuette, Christof
Zhang, Wei
NONLINEARITY, 2016, 29 (08) : 2298 - 2326
[33] Optimal Rate Sampling in 802.11 Systems: Theory, Design, and Implementation
Combes, Richard
Ok, Jungseul
Proutiere, Alexandre
Yun, Donggyu
Yi, Yung
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2019, 18 (05) : 1145 - 1158
[34] The achievable region method in the optimal control of queueing systems; Formulations, bounds and policies
Bertsimas, D
QUEUEING SYSTEMS, 1995, 21 (3-4) : 337 - 389
[35] Mixed-Integer DAE Optimal Control Problems: Necessary Conditions and Bounds
Gerdts, Matthias
Sager, Sebastian
CONTROL AND OPTIMIZATION WITH DIFFERENTIAL-ALGEBRAIC CONSTRAINTS, 2012, (23): : 189 - +
[36] Thompson sampling-based recursive block elimination for dynamic assignment under limited budget in pure-exploration
Parambath, Shameem Puthiya
Anagnostopoulos, Christos
Alfahad, Saleh Abdullah M.
DATA MINING AND KNOWLEDGE DISCOVERY, 2025, 39 (01)
[37] Necessary condition for near optimal control of linear forward-backward stochastic differential equations
Zhang, Liangquan
Huang, Jianhui
Li, Xun
INTERNATIONAL JOURNAL OF CONTROL, 2015, 88 (08) : 1594 - 1608

← 1 2 3 4 →