A New Look at Dynamic Regret for Non-Stationary Stochastic Bandits

被引:0
|
作者
Abbasi-Yadkori, Yasin [1 ]
Gyorgy, Andraes [1 ]
Lazic, Nevena [2 ]
机构
[1] Google DeepMind, London, England
[2] Google DeepMind, Mountain View, CA USA
关键词
Online learning; multi-armed bandits; non-stationary learning; dynamic regret; tracking; BOUNDS;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
We study the non-stationary stochastic multi-armed bandit problem, where the reward statistics of each arm may change several times during the course of learning. The performance of a learning algorithm is evaluated in terms of its dynamic regret, which is defined as the difference between the expected cumulative reward of an agent choosing the optimal arm in every time step and the cumulative reward of the learning algorithm. One way to measure the hardness of such environments is to consider how many times the identity of the optimal arm can change. We propose a method that achieves, in K-armed bandit problems, a near-optimal ($O) over tilde (p KN (S + 1)) dynamic regret, where N is the time horizon of the problem and S is the number of times the identity of the optimal arm changes, without prior knowledge of S. Previous works for this problem obtain regret bounds that scale with the number of changes (or the amount of change) in the reward functions, which can be much larger, or assume prior knowledge of S to achieve similar bounds.
引用
收藏
页数:37
相关论文
共 50 条
  • [1] Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits
    Saha, Aadirupa
    Gupta, Shubham
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022, : 19027 - 19049
  • [2] Achieving Optimal Dynamic Regret for Non-stationary Bandits without Prior Information
    Auer, Peter
    Chen, Yifang
    Gajane, Pratik
    Lee, Chung-Wei
    Luo, Haipeng
    Ortner, Ronald
    Wei, Chen-Yu
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99
  • [3] ANACONDA: An Improved Dynamic Regret Algorithm for Adaptive Non-Stationary Dueling Bandits
    Buening, Thomas Kleine
    Saha, Aadirupa
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 206, 2023, 206
  • [4] Stochastic Bandits with Graph Feedback in Non-Stationary Environments
    National Key Laboratory for Novel Software Technology, Nanjing University, Nanjing
    210023, China
    不详
    100102, China
    AAAI Conf. Artif. Intell., AAAI, 1600, (8758-8766): : 8758 - 8766
  • [5] Stochastic Bandits with Graph Feedback in Non-Stationary Environments
    Lu, Shiyin
    Hu, Yao
    Zhang, Lijun
    THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2021, 35 : 8758 - 8766
  • [6] Randomized Exploration for Non-Stationary Stochastic Linear Bandits
    Kim, Baekjin
    Tewari, Ambuj
    CONFERENCE ON UNCERTAINTY IN ARTIFICIAL INTELLIGENCE (UAI 2020), 2020, 124 : 71 - 80
  • [7] Reward Attack on Stochastic Bandits with Non-stationary Rewards
    Yang, Chenye
    Liu, Guanlin
    Lai, Lifeng
    FIFTY-SEVENTH ASILOMAR CONFERENCE ON SIGNALS, SYSTEMS & COMPUTERS, IEEECONF, 2023, : 1387 - 1393
  • [8] Non-Stationary Bandits under Recharging Payoffs: Improved Planning with Sublinear Regret
    Papadigenopoulos, Orestis
    Caramanis, Constantine
    Shakkottai, Sanjay
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 35, NEURIPS 2022, 2022,
  • [9] Some algorithms for correlated bandits with non-stationary rewards : Regret bounds and applications
    Mayekar, Prathamesh
    Hemachandra, Nandyala
    PROCEEDINGS OF THE THIRD ACM IKDD CONFERENCE ON DATA SCIENCES (CODS), 2016,
  • [10] Stochastic Bandits With Non-Stationary Rewards: Reward Attack and Defense
    Yang, Chenye
    Liu, Guanlin
    Lai, Lifeng
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2024, 72 : 5007 - 5020