Robust Heavy-Tailed Linear Bandits Algorithm

被引：0

作者：

Ma L. ^{[1
]}

Zhao P. ^{[1
]}

Zhou Z. ^{[1
]}

机构：

[1] National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing

来源：

Jisuanji Yanjiu yu Fazhan/Computer Research and Development | 2023年 / 60卷 / 06期

基金：

中国国家自然科学基金;

关键词：

distribution change; heavy-tailed noise; linear bandits; machine learning; open environment learning;

D O I：

10.7544/issn1000-1239.202220279

中图分类号：

学科分类号：

摘要：

The linear bandits model is one of the most foundational online learning models, where a linear function parametrizes the mean payoff of each arm. The linear bandits model encompasses various applications with strong theoretical guarantees and practical modeling ability. However, existing algorithms suffer from the data irregularity that frequently emerges in real-world applications, as the data are usually collected from open and dynamic environments. In this paper, we are particularly concerned with two kinds of data irregularities: the underlying regression parameter could be changed with time, and the noise might not be bounded or even not sub-Gaussian, which are referred to as model drift and heavy-tailed noise, respectively. To deal with the two hostile factors, we propose a novel algorithm based on upper confidence bound. The median-of-means estimator is used to handle the potential heavy-tailed noise, and the restarting mechanism is employed to tackle the model drift. Theoretically, we establish the minimax lower bound to characterize the difficulty and prove that our algorithm enjoys a no-regret upper bound. The attained results subsume previous analysis for scenarios without either model drift or heavy-tailed noise. Empirically, we additionally design several online ensemble techniques to make our algorithm more adaptive to the environments. Extensive experiments are conducted on synthetic and real-world datasets to validate the effectiveness. © 2023 Science Press. All rights reserved.

引用

页码：1385 / 1395

页数：10

共 43 条

[1] Bubeck S, Cesa-Bianchi N., Regret analysis of stochastic and nonstochastic multi-armed bandit problems[J], Foundations and Trends in Machine Learning, 5, 1, pp. 1-122, (2012)
[2] Woodroofe M., A one-armed bandit problem with a concomitant variable[J], Journal of the American Statistical Association, 74, 368, pp. 799-806, (1979)
[3] Lihong Li, Wei Chu, Langford J, Et al., A contextual-bandit approach to personalized news article recommendation [C], Proc of the 19th Int Conf on World Wide Web (WWW), pp. 661-670, (2010)
[4] Cont R, Bouchaud J., Herd behavior and aggregate fluctuations in financial markets [J], Macroeconomic dynamics, 4, 2, pp. 170-196, (2000)
[5] Roberts James A, Tjeerd W, Et al., The heavy tail of the human brain[J], Current Opinion in Neurobiology, 31, pp. 164-172, (2015)
[6] Naoki A, Philip M L., Associative reinforcement learning using linear probabilistic concepts [C], Proc of the 16th Int Conf on Machine Learning (ICML), pp. 3-11, (1999)
[7] Dani V, Hayes T P, Kakade S M., Stochastic linear optimization under bandit feedback [C], Proc of the 21st Annual Conf on Learning Theory (COLT), pp. 355-366, (2008)
[8] Abbasi-Yadkori Y, Pal D, Szepesvari C., Improved algorithms for linear stochastic bandits [C], Advances in Neural Information Processing Systems 24 (NIPS), pp. 2312-2320, (2011)
[9] Abbasi-Yadkori Y, Pal D, Szepesvari C., Online-to-confidence-set conversions and application to sparse stochastic bandits [C], Proc of the 15th Int Conf on Artificial Intelligence and Statistics (AISTATS), pp. 1-9, (2012)
[10] Agrawal S, Goyal N., Thompson sampling for contextual bandits with linear payoffs [C], Proc of the 30th Int Conf on Machine Learning (ICML), pp. 127-135, (2013)

← 1 2 3 4 5 →