Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems

被引:51
|
作者
Koulouriotis, D. E. [1 ]
Xanthopoulos, A. [1 ]
机构
[1] Democritus Univ Thrace, Sch Engn, Dept Prod & Management Engn, Dragana, Greece
关键词
decision-making agents; action selection; exploration-exploitation; multi-armed bandit; genetic algorithms; reinforcement learning;
D O I
10.1016/j.amc.2007.07.043
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Multi-armed bandit tasks have been extensively used to model the problem of balancing exploitation and exploration. A most challenging variant of the MABP is the non-stationary bandit problem where the agent is faced with the increased complexity of detecting changes in its environment. In this paper we examine a non-stationary, discrete-time, finite horizon bandit problem with a finite number of arms and Gaussian rewards. A family of important ad hoc methods exists that are suitable for non-stationary bandit tasks. These learning algorithms that offer intuition-based solutions to the exploitation-exploration trade-off have the advantage of not relying on strong theoretical assumptions while in the same time can be fine-tuned in order to produce near-optimal results. An entirely different approach to the non-stationary multi-armed bandit problem presents itself in the face of evolutionary algorithms. We present an evolutionary algorithm that was implemented to solve the non-stationary bandit problem along with ad hoc solution algorithms, namely action-value methods with e-greedy and softmax action selection rules, the probability matching method and finally the adaptive pursuit method. A number of simulation-based experiments was conducted and based on the numerical results that we obtained we discuss the methods' performances. (C) 2007 Elsevier Inc. All rights reserved.
引用
收藏
页码:913 / 922
页数:10
相关论文
共 50 条
  • [1] The non-stationary stochastic multi-armed bandit problem
    Allesiardo R.
    Féraud R.
    Maillard O.-A.
    Allesiardo, Robin (robin.allesiardo@gmail.com), 1600, Springer Science and Business Media Deutschland GmbH (03): : 267 - 283
  • [2] DYNAMIC SPECTRUM ACCESS WITH NON-STATIONARY MULTI-ARMED BANDIT
    Alaya-Feki, Afef Ben Hadj
    Moulines, Eric
    LeCornec, Alain
    2008 IEEE 9TH WORKSHOP ON SIGNAL PROCESSING ADVANCES IN WIRELESS COMMUNICATIONS, VOLS 1 AND 2, 2008, : 416 - 420
  • [3] Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings
    Bonnefoi, Remi
    Besson, Lilian
    Moy, Christophe
    Kaufmann, Emilie
    Palicot, Jacques
    COGNITIVE RADIO ORIENTED WIRELESS NETWORKS, 2018, 228 : 173 - 185
  • [4] LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments
    de Curto, J.
    de Zarza, I.
    Roig, Gemma
    Cano, Juan Carlos
    Manzoni, Pietro
    Calafate, Carlos T.
    ELECTRONICS, 2023, 12 (13)
  • [5] Bio-Inspired Meta-Learning for Active Exploration During Non-Stationary Multi-Armed Bandit Tasks
    Velentzas, George
    Tzafestas, Costas
    Khamassi, Mehdi
    PROCEEDINGS OF THE 2017 INTELLIGENT SYSTEMS CONFERENCE (INTELLISYS), 2017, : 661 - 669
  • [6] Mechanisms with learning for stochastic multi-armed bandit problems
    Shweta Jain
    Satyanath Bhat
    Ganesh Ghalme
    Divya Padmanabhan
    Y. Narahari
    Indian Journal of Pure and Applied Mathematics, 2016, 47 : 229 - 272
  • [7] MECHANISMS WITH LEARNING FOR STOCHASTIC MULTI-ARMED BANDIT PROBLEMS
    Jain, Shweta
    Bhat, Satyanath
    Ghalme, Ganesh
    Padmanabhan, Divya
    Narahari, Y.
    INDIAN JOURNAL OF PURE & APPLIED MATHEMATICS, 2016, 47 (02) : 229 - 272
  • [8] Satisficing in Multi-Armed Bandit Problems
    Reverdy, Paul
    Srivastava, Vaibhav
    Leonard, Naomi Ehrich
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2017, 62 (08) : 3788 - 3803
  • [9] Smart topology detection using multi-armed bandit reinforcement learning method
    Sonmez, Ferda Ozdemir
    Hankin, Chris
    Malacaria, Pasquale
    INFORMATION SECURITY JOURNAL, 2024,
  • [10] Multi-armed Bandit Problems with Strategic Arms
    Braverman, Mark
    Mao, Jieming
    Schneider, Jon
    Weinberg, S. Matthew
    CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99