Improved Regret Bounds for Bandit Combinatorial Optimization

被引：0

作者：

Ito, Shinji ^{[1
,2
]}

Hatano, Daisuke ^{[3
]}

Sumita, Hanna ^{[4
]}

Takemura, Kei ^{[1
]}

Fukunaga, Takuro ^{[3
,5
,6
]}

Kakimura, Naonori ^{[7
]}

Kawarabayashi, Ken-ichi ^{[8
]}

机构：

[1] NEC Corp Ltd, Tokyo, Japan

[2] Univ Tokyo, Tokyo, Japan

[3] RIKEN AIP, Tokyo, Japan

[4] Tokyo Metropolitan Univ, Tokyo, Japan

[5] Chuo Univ, Hachioji, Tokyo, Japan

[6] JST PRESTO, Tokyo, Japan

[7] Keio Univ, Tokyo, Japan

[8] Natl Inst Informat, Tokyo, Japan

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Bandit combinatorial optimization is a bandit framework in which a player chooses an action within a given finite set A subset of {0, 1}(d) and incurs a loss that is the inner product of the chosen action and an unobservable loss vector in R-d in each round. In this paper, we aim to reveal the property, which makes the bandit combinatorial optimization hard. Recently, Cohen et al. [8] obtained a lower bound Omega(root dk(3)T/logT) of the regret, where k is the maximum l(1)-norm of action vectors, and T is the number of rounds. This lower bound was achieved by considering a continuous strongly-correlated distribution of losses. Our main contribution is that we managed to improve this bound by Omega(root dk(3)T) through applying a factor of root log T, which can be done by means of strongly-correlated losses with binary values. The bound derives better regret bounds for three specific examples of the bandit combinatorial optimization: the multitask bandit, the bandit ranking and the multiple-play bandit. In particular, the bound obtained for the bandit ranking in the present study addresses an open problem raised in [8]. In addition, we demonstrate that the problem becomes easier without considering correlations among entries of loss vectors. In fact, if each entry of loss vectors is an independent random variable, then, one can achieve a regret of (O) over tilde(root dk(2)T), which is root k times smaller than the lower bound shown above. The observed results indicated that correlation among losses is the reason for observing a large regret.

引用

页数：10

共 50 条

[1] Improved Regret Bounds for Projection-free Bandit Convex Optimization
Garber, Dan
Kretzu, Ben
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 2196 - 2205
[2] Regret Bounds for Safe Gaussian Process Bandit Optimization
Amani, Sanae
Alizadeh, Mahnoosh
Thrampoulidis, Christos
2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 527 - 532
[3] Regret Bounds for Safe Gaussian Process Bandit Optimization
Amani, Sanae
Alizadeh, Mahnoosh
Thrampoulidis, Christos
LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 158 - 159
[4] Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback
Li, Junfan
Liao, Shizhong
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 333 - 348
[5] Regret Bounds for Expected Improvement Algorithms in Gaussian Process Bandit Optimization
Hung Tran-The
Gupta, Sunil
Rana, Santu
Venkatesh, Svetha
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
[6] Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem
Merlis, Nadav
Mannor, Shie
CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
[7] Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
Srinivas, Niranjan
Krause, Andreas
Kakade, Sham M.
Seeger, Matthias W.
IEEE TRANSACTIONS ON INFORMATION THEORY, 2012, 58 (05) : 3250 - 3265
[8] UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem
Peter Auer
Ronald Ortner
Periodica Mathematica Hungarica, 2010, 61 : 55 - 65
[9] UCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTI-ARMED BANDIT PROBLEM
Auer, Peter
Ortner, Ronald
PERIODICA MATHEMATICA HUNGARICA, 2010, 61 (1-2) : 55 - 65
[10] Regret bounds for Narendra-Shapiro bandit algorithms
Gadat, Sebastien
Panloup, Fabien
Saadane, Sofiane
STOCHASTICS-AN INTERNATIONAL JOURNAL OF PROBABILITY AND STOCHASTIC PROCESSES, 2018, 90 (06) : 886 - 926

← 1 2 3 4 5 →