Non-stationary Bandits with Knapsacks

被引:0
|
作者
Liu, Shang [1 ]
Jiang, Jiashuo [2 ]
Li, Xiaocheng [1 ]
机构
[1] Imperial Coll London, Imperial Coll Business Sch, London, England
[2] NYU, Stern Sch Business, New York, NY USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the problem of bandits with knapsacks (BwK) in a non-stationary environment. The BwK problem generalizes the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm. At each time, the decision maker/player chooses to play an arm, and s/he will receive a reward and consume certain amount of resource from each of the multiple resource types. The objective is to maximize the cumulative reward over a finite horizon subject to some knapsack constraints on the resources. Existing works study the BwK problem under either a stochastic or adversarial environment. Our paper considers a non-stationary environment which continuously interpolates these two extremes. We first show that the traditional notion of variation budget is insufficient to characterize the non-stationarity of the BwK problem for a sublinear regret due to the presence of the constraints, and then we propose a new notion of global non-stationarity measure. We employ both non-stationarity measures to derive upper and lower bounds for the problem. Our results are based on a primal-dual analysis of the underlying linear programs and highlight the interplay between the constraints and the non-stationarity. Finally, we also extend the non-stationarity measure to the problem of online convex optimization with constraints and obtain new regret bounds accordingly.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits
    Abbasi-Yadkori, Yasin
    Gyorgy, Andraes
    Lazic, Nevena
    JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
  • [22] Stochastic Bandits With Non-Stationary Rewards: Reward Attack and Defense
    Yang, Chenye
    Liu, Guanlin
    Lai, Lifeng
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 5007 - 5020
  • [23] Bandits with Knapsacks and Predictions
    Drago, Davide
    Celli, Andrea
    Elias, Marek
    UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2024, 244 : 1189 - 1206
  • [24] Adversarial Bandits with Knapsacks
    Immorlica, Nicole
    Sankararaman, Karthik Abinav
    Schapire, Robert
    Slivkins, Aleksandrs
    2019 IEEE 60TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2019), 2019, : 202 - 219
  • [25] Adversarial Bandits with Knapsacks
    Immorlica, Nicole
    Sankararaman, Karthik
    Schapire, Robert
    Slivkins, Aleksandrs
    JOURNAL OF THE ACM, 2022, 69 (06)
  • [26] Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
    Saha, Aadirupa
    Gupta, Shubham
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 19027 - 19049
  • [27] Non-stationary Continuum-armed Bandits for Online Hyperparameter Optimization
    Lu, Shiyin
    Zhou, Yu-Hang
    Shi, Jing-Cheng
    Zhu, Wenya
    Yu, Qingtao
    Chen, Qing-Guo
    Da, Qing
    Zhang, Lijun
    WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 618 - 627
  • [28] Bandits with Knapsacks (Extended Abstract)
    Badanidiyuru, Ashwinkumar
    Kleinberg, Robert
    Slivkins, Aleksandrs
    2013 IEEE 54TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2013, : 207 - 216
  • [29] On Logarithmic Regret for Bandits with Knapsacks
    Ren, Wenbo
    Liu, Jia
    Shroff, Ness B.
    2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
  • [30] Linear Contextual Bandits with Knapsacks
    Agrawal, Shipra
    Devanur, Nikhil R.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29