Regret lower bound and optimal algorithm for high-dimensional contextual linear bandit

被引:2
|
作者
Li, Ke [1 ]
Yang, Yun [1 ]
Narisetty, Naveen N. [1 ]
机构
[1] Univ Illinois, Dept Stat, Champaign, IL 61820 USA
来源
ELECTRONIC JOURNAL OF STATISTICS | 2021年 / 15卷 / 02期
关键词
Contextual linear bandit; high-dimension; minimax regret; sparsity; upper confidence bound; VARIABLE SELECTION;
D O I
10.1214/21-EJS1909
中图分类号
O21 [概率论与数理统计]; C8 [统计学];
学科分类号
020208 ; 070103 ; 0714 ;
摘要
In this paper, we consider the multi-armed bandit problem with high-dimensional features. First, we prove a minimax lower bound, O((log d)(alpha+1/2) T (1-alpha/2) + logT), for the cumulative regret, in terms of horizon T, dimension d and a margin parameter alpha is an element of [0, 1], which controls the separation between the optimal and the sub-optimal arms. This new lower bound unifies existing regret bound results that have different dependencies on T due to the use of different values of margin parameter a explicitly implied by their assumptions. Second, we propose a simple and computationally efficient algorithm inspired by the general Upper Confidence Bound (UCB) strategy that achieves a regret upper bound matching the lower bound. The proposed algorithm uses a properly centered l(1)-ball as the confidence set in contrast to the commonly used ellipsoid confidence set. In addition, the algorithm does not require any forced sampling step and is thereby adaptive to the practically unknown margin parameter. Simulations and a real data analysis are conducted to compare the proposed method with existing ones in the literature.
引用
收藏
页码:5652 / 5695
页数:44
相关论文
共 50 条
  • [1] Copeland Dueling Bandit Problem: Regret Lower Bound, Optimal Algorithm, and Computationally Efficient Algorithm
    Komiyama, Junpei
    Honda, Junya
    Nakagawa, Hiroshi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [2] High-dimensional Contextual Bandit Problem without Sparsity
    Komiyama, Junpei
    Imaizumi, Masaaki
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [3] Linear Upper Confidence Bound Algorithm for Contextual Bandit Problem with Piled Rewards
    Huang, Kuan-Hao
    Lin, Hsuan-Tien
    ADVANCES IN KNOWLEDGE DISCOVERY AND DATA MINING, PAKDD 2016, PT II, 2016, 9652 : 143 - 155
  • [4] HIGH DIMENSIONAL STOCHASTIC LINEAR CONTEXTUAL BANDIT WITH MISSING COVARIATES
    Jang, Byoungwook
    Nepper, Julia
    Chevrette, Marc
    Handelsman, Jo
    Hero, Alfred O., III
    2022 IEEE 32ND INTERNATIONAL WORKSHOP ON MACHINE LEARNING FOR SIGNAL PROCESSING (MLSP), 2022,
  • [5] Regret Lower Bound and Optimal Algorithm in Finite Stochastic Partial Monitoring
    Komiyama, Junpei
    Honda, Junya
    Nakagawa, Hiroshi
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 28 (NIPS 2015), 2015, 28
  • [6] Efficient and Robust High-Dimensional Linear Contextual Bandits
    Chen, Cheng
    Luo, Luo
    Zhang, Weinan
    Yu, Yong
    Lian, Yijiang
    PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, : 4259 - 4265
  • [7] An Improved Regret Bound for Thompson Sampling in the Gaussian Linear Bandit Setting
    Kalkanli, Cem
    Ozgur, Ayfer
    2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 2783 - 2788
  • [8] Thompson Sampling for High-Dimensional Sparse Linear Contextual Bandits
    Chakraborty, Sunrit
    Roy, Saptarshi
    Tewari, Ambuj
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 202, 2023, 202
  • [9] Near-Optimal Regret in Linear MDPs with Aggregate Bandit Feedback
    Cassel, Asaf
    Luo, Haipeng
    Rosenberg, Aviv
    Sotnikov, Dmitry
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, 2024, 235
  • [10] Adaptive Algorithm for Multi-Armed Bandit Problem with High-Dimensional Covariates
    Qian, Wei
    Ing, Ching-Kang
    Liu, Ji
    JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2024, 119 (546) : 970 - 982