Robust Stochastic Multi-Armed Bandits with Historical Data

被引：0

作者：

Yacobi, Sarah Boufelja ^{[1
]}

Bounefouf, Djallel ^{[2
]}

机构：

[1] Imperial Coll London, London, England

[2] IBM Res, New York, NY USA

来源：

COMPANION OF THE WORLD WIDE WEB CONFERENCE, WWW 2023 | 2023年

关键词：

Contextual Bandits; Bias; Propensity scores; Robust optimization;

D O I：

10.1145/3543873.3587653

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the problem of Stochastic Contextual Multi-Armed Bandits (CMABs) initialised with Historical data. Initialisation with historical data is an example of data-driven regularisation which should, in theory, accelerate the convergence of CMABs. However, in practice, we have little to no control over the underlying generation process of such data, which may exhibit some pathologies, possibly impeding the convergence and the stability of the algorithm. In this paper, we focus on two main challenges: bias selection and data corruption. We propose two new algorithms to solve these specific issues: LinUCB with historical data and offline balancing (OB-HLinUCB) and Robust LinUCB with corrupted historical data (R-HLinUCB). We derive their theoretical regret bounds and discuss their computational performance using real-world datasets.

引用

页码：959 / 965

页数：7

共 50 条

[1] Robust Risk-Averse Stochastic Multi-armed Bandits
Maillard, Odalric-Ambrym
ALGORITHMIC LEARNING THEORY (ALT 2013), 2013, 8139 : 218 - 233
[2] Stochastic Multi-Armed Bandits with Control Variates
Verma, Arun
Hanawal, Manjesh K.
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
[3] Stochastic Multi-armed Bandits in Constant Space
Liau, David
Price, Eric
Song, Zhao
Yang, Ger
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
[4] Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions
Lancewicki, Tal
Segal, Shahar
Koren, Tomer
Mansour, Yishay
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[5] Networked Stochastic Multi-Armed Bandits with Combinatorial Strategies
Tang, Shaojie
Zhou, Yaqin
Han, Kai
Zhang, Zhao
Yuan, Jing
Wu, Weili
2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 786 - 793
[6] Anytime optimal algorithms in stochastic multi-armed bandits
Degenne, Remy
Perchet, Vianney
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[7] Parametrized Stochastic Multi-armed Bandits with Binary Rewards
Jiang, Chong
Srikant, R.
2011 AMERICAN CONTROL CONFERENCE, 2011, : 119 - 124
[8] Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits
Wang, Zhiwei
Wang, Huazheng
Wang, Hongning
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15770 - 15777
[9] Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets
Wan, Zongqi
Zhang, Zhijie
Li, Tongyang
Zhang, Jialin
Sun, Xiaoming
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 10087 - 10094
[10] On Kernelized Multi-armed Bandits
Chowdhury, Sayak Ray
Gopalan, Aditya
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70

← 1 2 3 4 5 →