Robust Stochastic Multi-Armed Bandits with Historical Data

被引:0
|
作者
Yacobi, Sarah Boufelja [1 ]
Bounefouf, Djallel [2 ]
机构
[1] Imperial Coll London, London, England
[2] IBM Res, New York, NY USA
关键词
Contextual Bandits; Bias; Propensity scores; Robust optimization;
D O I
10.1145/3543873.3587653
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the problem of Stochastic Contextual Multi-Armed Bandits (CMABs) initialised with Historical data. Initialisation with historical data is an example of data-driven regularisation which should, in theory, accelerate the convergence of CMABs. However, in practice, we have little to no control over the underlying generation process of such data, which may exhibit some pathologies, possibly impeding the convergence and the stability of the algorithm. In this paper, we focus on two main challenges: bias selection and data corruption. We propose two new algorithms to solve these specific issues: LinUCB with historical data and offline balancing (OB-HLinUCB) and Robust LinUCB with corrupted historical data (R-HLinUCB). We derive their theoretical regret bounds and discuss their computational performance using real-world datasets.
引用
收藏
页码:959 / 965
页数:7
相关论文
共 50 条
  • [1] Robust Risk-Averse Stochastic Multi-armed Bandits
    Maillard, Odalric-Ambrym
    ALGORITHMIC LEARNING THEORY (ALT 2013), 2013, 8139 : 218 - 233
  • [2] Stochastic Multi-Armed Bandits with Control Variates
    Verma, Arun
    Hanawal, Manjesh K.
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021), 2021,
  • [3] Stochastic Multi-armed Bandits in Constant Space
    Liau, David
    Price, Eric
    Song, Zhao
    Yang, Ger
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
  • [4] Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions
    Lancewicki, Tal
    Segal, Shahar
    Koren, Tomer
    Mansour, Yishay
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [5] Networked Stochastic Multi-Armed Bandits with Combinatorial Strategies
    Tang, Shaojie
    Zhou, Yaqin
    Han, Kai
    Zhang, Zhao
    Yuan, Jing
    Wu, Weili
    2017 IEEE 37TH INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2017), 2017, : 786 - 793
  • [6] Anytime optimal algorithms in stochastic multi-armed bandits
    Degenne, Remy
    Perchet, Vianney
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [7] Parametrized Stochastic Multi-armed Bandits with Binary Rewards
    Jiang, Chong
    Srikant, R.
    2011 AMERICAN CONTROL CONFERENCE, 2011, : 119 - 124
  • [8] Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits
    Wang, Zhiwei
    Wang, Huazheng
    Wang, Hongning
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 14, 2024, : 15770 - 15777
  • [9] Quantum Multi-Armed Bandits and Stochastic Linear Bandits Enjoy Logarithmic Regrets
    Wan, Zongqi
    Zhang, Zhijie
    Li, Tongyang
    Zhang, Jialin
    Sun, Xiaoming
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 8, 2023, : 10087 - 10094
  • [10] On Kernelized Multi-armed Bandits
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70