Robust Heavy-Tailed Linear Bandits Algorithm

被引:0
|
作者
Ma L. [1 ]
Zhao P. [1 ]
Zhou Z. [1 ]
机构
[1] National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing
基金
中国国家自然科学基金;
关键词
distribution change; heavy-tailed noise; linear bandits; machine learning; open environment learning;
D O I
10.7544/issn1000-1239.202220279
中图分类号
学科分类号
摘要
The linear bandits model is one of the most foundational online learning models, where a linear function parametrizes the mean payoff of each arm. The linear bandits model encompasses various applications with strong theoretical guarantees and practical modeling ability. However, existing algorithms suffer from the data irregularity that frequently emerges in real-world applications, as the data are usually collected from open and dynamic environments. In this paper, we are particularly concerned with two kinds of data irregularities: the underlying regression parameter could be changed with time, and the noise might not be bounded or even not sub-Gaussian, which are referred to as model drift and heavy-tailed noise, respectively. To deal with the two hostile factors, we propose a novel algorithm based on upper confidence bound. The median-of-means estimator is used to handle the potential heavy-tailed noise, and the restarting mechanism is employed to tackle the model drift. Theoretically, we establish the minimax lower bound to characterize the difficulty and prove that our algorithm enjoys a no-regret upper bound. The attained results subsume previous analysis for scenarios without either model drift or heavy-tailed noise. Empirically, we additionally design several online ensemble techniques to make our algorithm more adaptive to the environments. Extensive experiments are conducted on synthetic and real-world datasets to validate the effectiveness. © 2023 Science Press. All rights reserved.
引用
收藏
页码:1385 / 1395
页数:10
相关论文
共 43 条
  • [31] Bubeck S, Cesa-Bianchi N, Lugosi G., Bandits with heavy tail[J], IEEE Transactions on Information Theory, 59, 11, pp. 7711-7717, (2013)
  • [32] Vakili S, Keqin Liu, Qing Zhao, Deterministic sequencing of exploration and exploitation for multi-armed bandit problems[J], IEEE Journal of Selected Topics in Signal Processing, 7, 5, pp. 759-767, (2013)
  • [33] Hsu D, Sabato S., Heavy-tailed regression with a generalized median-of-means [C], Proc of the 31st Int Conf on Machine Learning (ICML), pp. 37-45, (2014)
  • [34] Audibert J Y, Catoni O., Robust linear least squares regression[J], The Annals of Statistics, 39, 5, pp. 2766-2794, (2011)
  • [35] Medina A M, Yang S., No-regret algorithms for heavy-tailed linear bandits [C], Proc of the 33rd Int Conf on Machine Learning (ICML), pp. 1642-1650, (2016)
  • [36] Han Shao, Xiaotian Yu, King I, Et al., Almost optimal algorithms for linear stochastic bandits with heavytailed payoffs [C], Advances in Neural Information Processing Systems 31 (NeurIPS), pp. 8420-8429, (2018)
  • [37] Xue Bo, Wang Guanghui, Wang Yimu, Et al., Nearly optimal regret for stochastic linear bandits with heavy-tailed payoffs [C], Proc of the 29th Int Joint Conf on Artificial Intelligence (IJCAI), pp. 2936-2942, (2020)
  • [38] Jiayuan Yu, Mannor S., Piecewise-stationary bandit problems with side observations [C], Proc of the 26th Annual Int Conf on Machine Learning (ICML), pp. 1177-1184, (2009)
  • [39] Benedetto D G, Bellini V, Zappella G., A linear bandit for seasonal environments [J], (2020)
  • [40] Asuncion A, Newman D., UCI machine learning repository, (2007)