Predictive reinforcement learning in non-stationary environments using weighted mixture policy

被引:0
|
作者
Pourshamsaei, Hossein [1 ]
Nobakhti, Amin [1 ]
机构
[1] Sharif Univ Technol, Dept Elect Engn, Azadi Ave, Tehran 111554363, Iran
关键词
Reinforcement learning; Non-stationary environments; Adaptive learning rate; Mixture policy; Predictive reference tracking; MODEL;
D O I
10.1016/j.asoc.2024.111305
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement Learning (RL) within non-stationary environments presents a formidable challenge. In some applications, anticipating abrupt alterations in the environment model might be possible. The existing literature lacks a framework that proactively harnesses such predictions to enhance reward optimization. This paper introduces an innovative methodology designed to preemptively leverage these predictions, thereby maximizing the overall achieved performance. This is executed by formulating a novel approach that generates a weighted mixture policy from both the optimal policies of the prevailing and forthcoming models. To ensure safe learning, an adaptive learning rate is derived to facilitate training of the weighted mixture policy. This theoretically guarantees monotonic performance improvement at each update during training. Empirical trials focus on a model-free predictive reference tracking scenario involving piecewise constant references. Through the utilization of the cart-pole position control problem, it is demonstrated that the proposed algorithm surpasses prior techniques such as context Q-learning and RL with context detection algorithms in nonstationary environments. Moreover, the algorithm outperforms the application of individual optimal policies derived from each observed environment model (i.e., policies not utilizing predictions).
引用
收藏
页数:16
相关论文
共 50 条
  • [21] Real-time dynamic pricing in a non-stationary environment using model-free reinforcement learning
    Rana, Rupal
    Oliveira, Fernando S.
    OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE, 2014, 47 : 116 - 126
  • [22] Accelerated Variant of Reinforcement Learning Algorithms for Light Control with Non-stationary User Behaviour
    Haddam, Nassim
    Boulakia, Benjamin Cohen
    Barth, Dominique
    PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON SMART CITIES AND GREEN ICT SYSTEMS (SMARTGREENS), 2022, : 78 - 85
  • [23] Fundamental Limits of Age-of-Information in Stationary and Non-stationary Environments
    Banerjee, Subhankar
    Bhattacharjee, Rajarshi
    Sinha, Abhishek
    2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 1741 - 1746
  • [24] Multi-Agent Combat in Non-Stationary Environments
    Li, Shengang
    Chi, Haoang
    Xie, Tao
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [25] Online Learning for Non-Stationary A/B Tests
    Medina, Andres Munoz
    Vassilvitiskii, Sergei
    Yin, Dong
    CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 317 - 326
  • [26] A Policy Search and Transfer Approach in the Non-stationary Environment
    Zhu F.
    Liu Q.
    Fu Q.-M.
    Chen D.-H.
    Wang H.
    Fu Y.-C.
    Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2017, 45 (02): : 257 - 266
  • [27] An RBF online learning scheme for non-stationary environments based on fuzzy means and Givens rotations
    Karamichailidou, Despina
    Koletsios, Sotirios
    Alexandridis, Alex
    NEUROCOMPUTING, 2022, 501 : 370 - 386
  • [28] A Model-free Reinforcement Learning Approach for the Energetic Control of a Building with Non-stationary User Behaviour
    Haddam, Nassim
    Boulakia, Benjamin Cohen
    Barth, Dominique
    2020 THE 4TH INTERNATIONAL CONFERENCE ON SMART GRID AND SMART CITIES (ICSGSC 2020), 2020, : 168 - 177
  • [29] Stochastic discretized learning-based weak estimation: a novel estimation method for non-stationary environments
    Yazidi, Anis
    Oommen, B. John
    Horn, Geir
    Granma, Ole-Christoffer
    PATTERN RECOGNITION, 2016, 60 : 430 - 443
  • [30] Non-stationary Dueling Bandits for Online Learning to Rank
    Lu, Shiyin
    Miao, Yuan
    Yang, Ping
    Hu, Yao
    Zhang, Lijun
    WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 166 - 174