Non-stationary Bandits with Knapsacks

被引:0
|
作者
Liu, Shang [1 ]
Jiang, Jiashuo [2 ]
Li, Xiaocheng [1 ]
机构
[1] Imperial Coll London, Imperial Coll Business Sch, London, England
[2] NYU, Stern Sch Business, New York, NY USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the problem of bandits with knapsacks (BwK) in a non-stationary environment. The BwK problem generalizes the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm. At each time, the decision maker/player chooses to play an arm, and s/he will receive a reward and consume certain amount of resource from each of the multiple resource types. The objective is to maximize the cumulative reward over a finite horizon subject to some knapsack constraints on the resources. Existing works study the BwK problem under either a stochastic or adversarial environment. Our paper considers a non-stationary environment which continuously interpolates these two extremes. We first show that the traditional notion of variation budget is insufficient to characterize the non-stationarity of the BwK problem for a sublinear regret due to the presence of the constraints, and then we propose a new notion of global non-stationarity measure. We employ both non-stationarity measures to derive upper and lower bounds for the problem. Our results are based on a primal-dual analysis of the underlying linear programs and highlight the interplay between the constraints and the non-stationarity. Finally, we also extend the non-stationarity measure to the problem of online convex optimization with constraints and obtain new regret bounds accordingly.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Approximation Algorithms for Correlated Knapsacks and Non-Martingale Bandits
    Gupta, Anupam
    Krishnaswamy, Ravishankar
    Molinaro, Marco
    Ravi, R.
    2011 IEEE 52ND ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2011), 2011, : 827 - 836
  • [32] Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information
    Auer, Peter
    Chen, Yifang
    Gajane, Pratik
    Lee, Chung-Wei
    Luo, Haipeng
    Ortner, Ronald
    Wei, Chen-Yu
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [33] ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
    Buening, Thomas Kleine
    Saha, Aadirupa
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [34] A Change-Detection-Based Thompson Sampling Framework for Non-Stationary Bandits
    Ghatak, Gourab
    IEEE TRANSACTIONS ON COMPUTERS, 2021, 70 (10) : 1670 - 1676
  • [35] Contextual Multi-Armed Bandits for Non-Stationary Wireless Network Selection
    Martinez, Lluis
    Vidal, Josep
    Cabrera-Bean, Margarita
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 285 - 290
  • [36] Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret
    Papadigenopoulos, Orestis
    Caramanis, Constantine
    Shakkottai, Sanjay
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [37] A Technical Note on Non-Stationary Parametric Bandits: Existing Mistakes and Preliminary Solutions
    Faury, Louis
    Russac, Yoan
    Abeille, Marc
    Calauzenes, Clement
    ALGORITHMIC LEARNING THEORY, VOL 132, 2021, 132
  • [38] Some algorithms for correlated bandits with non-stationary rewards : Regret bounds and applications
    Mayekar, Prathamesh
    Hemachandra, Nandyala
    PROCEEDINGS OF THE THIRD ACM IKDD CONFERENCE ON DATA SCIENCES (CODS), 2016,
  • [39] Non-monotonic Resource Utilization in the Bandits with Knapsacks Problem
    Kumar, Raunak
    Kleinberg, Robert D.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [40] Combinatorial Semi-Bandits with Knapsacks
    Sankararaman, Karthik Abinav
    Slivkins, Aleksandrs
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84