Improved Regret Bounds for Bandit Combinatorial Optimization

被引:0
|
作者
Ito, Shinji [1 ,2 ]
Hatano, Daisuke [3 ]
Sumita, Hanna [4 ]
Takemura, Kei [1 ]
Fukunaga, Takuro [3 ,5 ,6 ]
Kakimura, Naonori [7 ]
Kawarabayashi, Ken-ichi [8 ]
机构
[1] NEC Corp Ltd, Tokyo, Japan
[2] Univ Tokyo, Tokyo, Japan
[3] RIKEN AIP, Tokyo, Japan
[4] Tokyo Metropolitan Univ, Tokyo, Japan
[5] Chuo Univ, Hachioji, Tokyo, Japan
[6] JST PRESTO, Tokyo, Japan
[7] Keio Univ, Tokyo, Japan
[8] Natl Inst Informat, Tokyo, Japan
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019) | 2019年 / 32卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bandit combinatorial optimization is a bandit framework in which a player chooses an action within a given finite set A subset of {0, 1}(d) and incurs a loss that is the inner product of the chosen action and an unobservable loss vector in R-d in each round. In this paper, we aim to reveal the property, which makes the bandit combinatorial optimization hard. Recently, Cohen et al. [8] obtained a lower bound Omega(root dk(3)T/logT) of the regret, where k is the maximum l(1)-norm of action vectors, and T is the number of rounds. This lower bound was achieved by considering a continuous strongly-correlated distribution of losses. Our main contribution is that we managed to improve this bound by Omega(root dk(3)T) through applying a factor of root log T, which can be done by means of strongly-correlated losses with binary values. The bound derives better regret bounds for three specific examples of the bandit combinatorial optimization: the multitask bandit, the bandit ranking and the multiple-play bandit. In particular, the bound obtained for the bandit ranking in the present study addresses an open problem raised in [8]. In addition, we demonstrate that the problem becomes easier without considering correlations among entries of loss vectors. In fact, if each entry of loss vectors is an independent random variable, then, one can achieve a regret of (O) over tilde(root dk(2)T), which is root k times smaller than the lower bound shown above. The observed results indicated that correlation among losses is the reason for observing a large regret.
引用
收藏
页数:10
相关论文
共 50 条
  • [1] Improved Regret Bounds for Projection-free Bandit Convex Optimization
    Garber, Dan
    Kretzu, Ben
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 2196 - 2205
  • [2] Regret Bounds for Safe Gaussian Process Bandit Optimization
    Amani, Sanae
    Alizadeh, Mahnoosh
    Thrampoulidis, Christos
    2021 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2021, : 527 - 532
  • [3] Regret Bounds for Safe Gaussian Process Bandit Optimization
    Amani, Sanae
    Alizadeh, Mahnoosh
    Thrampoulidis, Christos
    LEARNING FOR DYNAMICS AND CONTROL, VOL 120, 2020, 120 : 158 - 159
  • [4] Improved Regret Bounds for Online Kernel Selection Under Bandit Feedback
    Li, Junfan
    Liao, Shizhong
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2022, PT IV, 2023, 13716 : 333 - 348
  • [5] Regret Bounds for Expected Improvement Algorithms in Gaussian Process Bandit Optimization
    Hung Tran-The
    Gupta, Sunil
    Rana, Santu
    Venkatesh, Svetha
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [6] Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem
    Merlis, Nadav
    Mannor, Shie
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [7] Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting
    Srinivas, Niranjan
    Krause, Andreas
    Kakade, Sham M.
    Seeger, Matthias W.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2012, 58 (05) : 3250 - 3265
  • [8] UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem
    Peter Auer
    Ronald Ortner
    Periodica Mathematica Hungarica, 2010, 61 : 55 - 65
  • [9] UCB REVISITED: IMPROVED REGRET BOUNDS FOR THE STOCHASTIC MULTI-ARMED BANDIT PROBLEM
    Auer, Peter
    Ortner, Ronald
    PERIODICA MATHEMATICA HUNGARICA, 2010, 61 (1-2) : 55 - 65
  • [10] Regret bounds for Narendra-Shapiro bandit algorithms
    Gadat, Sebastien
    Panloup, Fabien
    Saadane, Sofiane
    STOCHASTICS-AN INTERNATIONAL JOURNAL OF PROBABILITY AND STOCHASTIC PROCESSES, 2018, 90 (06) : 886 - 926