Non-stationary Bandits with Knapsacks

被引:0
|
作者
Liu, Shang [1 ]
Jiang, Jiashuo [2 ]
Li, Xiaocheng [1 ]
机构
[1] Imperial Coll London, Imperial Coll Business Sch, London, England
[2] NYU, Stern Sch Business, New York, NY USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In this paper, we study the problem of bandits with knapsacks (BwK) in a non-stationary environment. The BwK problem generalizes the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm. At each time, the decision maker/player chooses to play an arm, and s/he will receive a reward and consume certain amount of resource from each of the multiple resource types. The objective is to maximize the cumulative reward over a finite horizon subject to some knapsack constraints on the resources. Existing works study the BwK problem under either a stochastic or adversarial environment. Our paper considers a non-stationary environment which continuously interpolates these two extremes. We first show that the traditional notion of variation budget is insufficient to characterize the non-stationarity of the BwK problem for a sublinear regret due to the presence of the constraints, and then we propose a new notion of global non-stationarity measure. We employ both non-stationarity measures to derive upper and lower bounds for the problem. Our results are based on a primal-dual analysis of the underlying linear programs and highlight the interplay between the constraints and the non-stationarity. Finally, we also extend the non-stationarity measure to the problem of online convex optimization with constraints and obtain new regret bounds accordingly.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] Bandits with Knapsacks beyond the Worst Case
    Sankararaman, Karthik Abinav
    Slivkins, Aleksandrs
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021, 34
  • [42] Contextual Bandits with Knapsacks for a Conversion Model
    Li, Zhen
    Stoltz, Gilles
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [43] Non-Stationary Linear Bandits With Dimensionality Reduction for Large-Scale Recommender Systems
    Ghoorchian, Saeed
    Kortukov, Evgenii
    Maghsudi, Setareh
    IEEE OPEN JOURNAL OF SIGNAL PROCESSING, 2024, 5 : 548 - 558
  • [44] Implementation of Exploration in TONIC Using Non-stationary Volatile Multi-arm Bandits
    Shaha, Aditya
    Arya, Dhruv
    Tripathy, B. K.
    SOFT COMPUTING FOR PROBLEM SOLVING, SOCPROS 2018, VOL 1, 2020, 1048 : 239 - 250
  • [45] A Risk-Averse Framework for Non-Stationary Stochastic Multi-Armed Bandits
    Alami, Reda
    Mahfoud, Mohammed
    Achab, Mastane
    2023 23RD IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS, ICDMW 2023, 2023, : 272 - 280
  • [46] Contextual Multi-Armed Bandits for Non-Stationary Heterogeneous Mobile Edge Computing
    Wirth, Maximilian
    Ortiz, Andrea
    Klein, Anja
    IEEE CONFERENCE ON GLOBAL COMMUNICATIONS, GLOBECOM, 2023, : 5599 - 5604
  • [47] A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free
    Chen, Yifang
    Lee, Chung-Wei
    Luo, Haipeng
    Wei, Chen-Yu
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [48] An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge
    Hong, Kihyuk
    Li, Yuhang
    Tewari, Ambuj
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [49] Risk-Aware Bandits for Digital Twin Placement in Non-Stationary Mobile Edge Computing
    Wirth, Maximilian
    Ortiz, Andrea
    Klein, Anja
    2024 IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS WORKSHOPS, ICC WORKSHOPS 2024, 2024, : 13 - 18
  • [50] Stochastic Multi-Armed Bandits with Non-Stationary Rewards Generated by a Linear Dynamical System
    Gornet, Jonathan
    Hosseinzadeh, Mehdi
    Sinopoli, Bruno
    2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), 2022, : 1460 - 1465