Non-stationary Bandits with Knapsacks

被引:0
|
作者
Liu, Shang [1 ]
Jiang, Jiashuo [2 ]
Li, Xiaocheng [1 ]
机构
[1] Imperial Coll London, Imperial Coll Business Sch, London, England
[2] NYU, Stern Sch Business, New York, NY USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the problem of bandits with knapsacks (BwK) in a non-stationary environment. The BwK problem generalizes the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm. At each time, the decision maker/player chooses to play an arm, and s/he will receive a reward and consume certain amount of resource from each of the multiple resource types. The objective is to maximize the cumulative reward over a finite horizon subject to some knapsack constraints on the resources. Existing works study the BwK problem under either a stochastic or adversarial environment. Our paper considers a non-stationary environment which continuously interpolates these two extremes. We first show that the traditional notion of variation budget is insufficient to characterize the non-stationarity of the BwK problem for a sublinear regret due to the presence of the constraints, and then we propose a new notion of global non-stationarity measure. We employ both non-stationarity measures to derive upper and lower bounds for the problem. Our results are based on a primal-dual analysis of the underlying linear programs and highlight the interplay between the constraints and the non-stationarity. Finally, we also extend the non-stationarity measure to the problem of online convex optimization with constraints and obtain new regret bounds accordingly.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Approximately Stationary Bandits with Knapsacks
    Fikioris, Giannis
    Tardos, Eva
    THIRTY SIXTH ANNUAL CONFERENCE ON LEARNING THEORY, VOL 195, 2023, 195
  • [2] Unifying Clustered and Non-stationary Bandits
    Li, Chuanhao
    Wu, Qingyun
    Wang, Hongning
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [3] Non-stationary Bandits with Heavy Tail
    Pan, Weici
    Liu, Zhenhua
    Performance Evaluation Review, 2024, 52 (02): : 33 - 35
  • [4] Cascading Non-Stationary Bandits: Online Learning to Rank in the Non-Stationary Cascade Model
    Li, Chang
    de Rijke, Maarten
    PROCEEDINGS OF THE TWENTY-EIGHTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2019, : 2859 - 2865
  • [5] A Simple Approach for Non-stationary Linear Bandits
    Zhao, Peng
    Zhang, Lijun
    Jiang, Yuan
    Zhou, Zhi-Hua
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 108, 2020, 108 : 746 - 754
  • [6] Learning Contextual Bandits in a Non-stationary Environment
    Wu, Qingyun
    Iyer, Naveen
    Wang, Hongning
    ACM/SIGIR PROCEEDINGS 2018, 2018, : 495 - 504
  • [7] Weighted Linear Bandits for Non-Stationary Environments
    Russac, Yoan
    Vernade, Claire
    Cappe, Olivier
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [8] Competing Bandits in Non-Stationary Matching Markets
    Ghosh, Avishek
    Sankararaman, Abishek
    Ramchandran, Kannan
    Javidi, Tara
    Mazumdar, Arya
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2024, 70 (04) : 2831 - 2850
  • [9] Weighted Gaussian Process Bandits for Non-stationary Environments
    Deng, Yuntian
    Zhou, Xingyu
    Kim, Baekjin
    Tewari, Ambuj
    Gupta, Abhishek
    Shroff, Ness
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 151, 2022, 151
  • [10] Stochastic Bandits with Graph Feedback in Non-Stationary Environments
    National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing
    210023, China
    不详
    100102, China
    AAAI Conf. Artif. Intell., AAAI, 1600, (8758-8766): : 8758 - 8766