Non-stationary Bandits with Knapsacks

被引：0

作者：

Liu, Shang ^{[1
]}

Jiang, Jiashuo ^{[2
]}

Li, Xiaocheng ^{[1
]}

机构：

[1] Imperial Coll London, Imperial Coll Business Sch, London, England

[2] NYU, Stern Sch Business, New York, NY USA

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022 | 2022年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In this paper, we study the problem of bandits with knapsacks (BwK) in a non-stationary environment. The BwK problem generalizes the multi-arm bandit (MAB) problem to model the resource consumption associated with playing each arm. At each time, the decision maker/player chooses to play an arm, and s/he will receive a reward and consume certain amount of resource from each of the multiple resource types. The objective is to maximize the cumulative reward over a finite horizon subject to some knapsack constraints on the resources. Existing works study the BwK problem under either a stochastic or adversarial environment. Our paper considers a non-stationary environment which continuously interpolates these two extremes. We first show that the traditional notion of variation budget is insufficient to characterize the non-stationarity of the BwK problem for a sublinear regret due to the presence of the constraints, and then we propose a new notion of global non-stationarity measure. We employ both non-stationarity measures to derive upper and lower bounds for the problem. Our results are based on a primal-dual analysis of the underlying linear programs and highlight the interplay between the constraints and the non-stationarity. Finally, we also extend the non-stationarity measure to the problem of online convex optimization with constraints and obtain new regret bounds accordingly.

引用

页数：11

共 50 条

[21] A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits
Abbasi-Yadkori, Yasin
Gyorgy, Andraes
Lazic, Nevena
JOURNAL OF MACHINE LEARNING RESEARCH, 2023, 24
[22] Stochastic Bandits With Non-Stationary Rewards: Reward Attack and Defense
Yang, Chenye
Liu, Guanlin
Lai, Lifeng
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 5007 - 5020
[23] Bandits with Knapsacks and Predictions
Drago, Davide
Celli, Andrea
Elias, Marek
UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, 2024, 244 : 1189 - 1206
[24] Adversarial Bandits with Knapsacks
Immorlica, Nicole
Sankararaman, Karthik Abinav
Schapire, Robert
Slivkins, Aleksandrs
2019 IEEE 60TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS 2019), 2019, : 202 - 219
[25] Adversarial Bandits with Knapsacks
Immorlica, Nicole
Sankararaman, Karthik
Schapire, Robert
Slivkins, Aleksandrs
JOURNAL OF THE ACM, 2022, 69 (06)
[26] Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
Saha, Aadirupa
Gupta, Shubham
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 19027 - 19049
[27] Non-stationary Continuum-armed Bandits for Online Hyperparameter Optimization
Lu, Shiyin
Zhou, Yu-Hang
Shi, Jing-Cheng
Zhu, Wenya
Yu, Qingtao
Chen, Qing-Guo
Da, Qing
Zhang, Lijun
WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING, 2022, : 618 - 627
[28] Bandits with Knapsacks (Extended Abstract)
Badanidiyuru, Ashwinkumar
Kleinberg, Robert
Slivkins, Aleksandrs
2013 IEEE 54TH ANNUAL SYMPOSIUM ON FOUNDATIONS OF COMPUTER SCIENCE (FOCS), 2013, : 207 - 216
[29] On Logarithmic Regret for Bandits with Knapsacks
Ren, Wenbo
Liu, Jia
Shroff, Ness B.
2021 55TH ANNUAL CONFERENCE ON INFORMATION SCIENCES AND SYSTEMS (CISS), 2021,
[30] Linear Contextual Bandits with Knapsacks
Agrawal, Shipra
Devanur, Nikhil R.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29

← 1 2 3 4 5 →