Improved Regret Bounds of Bilinear Bandits using Action Space Analysis

被引：0

作者：

Jang, Kyoungseok ^{[1
]}

Jun, Kwang-Sung ^{[2
]}

Yun, Se-Young ^{[3
]}

Kang, Wanmo ^{[1
]}

机构：

[1] Korea Adv Inst Sci & Technol, Dept Math Sci, Daejeon, South Korea

[2] Univ Arizona, Dept Computat, Tucson, AZ 85721 USA

[3] Korea Adv Inst Sci & Technol, Grad Sch AI, Daejeon, South Korea

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139 | 2021年 / 139卷

基金：

新加坡国家研究基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the bilinear bandit problem where the learner chooses a pair of arms, each from two different action spaces of dimension d(1) and d(2), respectively. The learner then receives a reward whose expectation is a bilinear function of the two chosen arms with an unknown matrix parameter Theta* is an element of R with rank r. Despite abundant applications such as drug discovery, the optimal regret rate is unknown for this problem, though it was conjectured to be (O) over tilde ( root d)rT) by Jun et al. (2019) where (O) over tilde ignores polylogarithmic factors in T. In this paper, we make progress towards closing the gap between the upper and lower bound on the optimal regret. First, we reject the conjecture above by proposing algorithms that achieve the regret (O) over tilde ( root d)rT) using the fact that the action space dimension O(d) is significantly lower than the matrix parameter dimension O(d). Second, we additionally devise an algorithm with better empirical performance than previous algorithms.

引用

页数：11

共 40 条

[1] Abbasi-Yadkori Y., 2012, Artificial Intelligence and Statistics, P1
[2] Abbasi-Yadkori Y, 2011, Advances in neural information processing systems, V24, P2312
[3] [Anonymous], 2009, Advances in neural information processing systems
[4] Auer P, 2003, SIAM J COMPUT, V32, P48, DOI 10.1137/S0097539701398375
[5] Auer P., 2002, Journal of Machine Learning Research, V3, P397, DOI DOI 10.4271/610369
[6] Beygelzimer A., 2011, P 14 INT C ART INT S, P19
[7] Bhojanapalli Srinadh, 2016, Advances in Neural Information Processing Systems, V29
[8] Brochu E., 2010, A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning
[9] A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization
Burer, S
Monteiro, RDC
[J]. MATHEMATICAL PROGRAMMING, 2003, 95 (02) : 329 - 357
[10] Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements
Candes, Emmanuel J.
Plan, Yaniv
[J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2011, 57 (04) : 2342 - 2359

← 1 2 3 4 →