Predictive reinforcement learning in non-stationary environments using weighted mixture policy

被引:0
|
作者
Pourshamsaei, Hossein [1 ]
Nobakhti, Amin [1 ]
机构
[1] Sharif Univ Technol, Dept Elect Engn, Azadi Ave, Tehran 111554363, Iran
关键词
Reinforcement learning; Non-stationary environments; Adaptive learning rate; Mixture policy; Predictive reference tracking; MODEL;
D O I
10.1016/j.asoc.2024.111305
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement Learning (RL) within non-stationary environments presents a formidable challenge. In some applications, anticipating abrupt alterations in the environment model might be possible. The existing literature lacks a framework that proactively harnesses such predictions to enhance reward optimization. This paper introduces an innovative methodology designed to preemptively leverage these predictions, thereby maximizing the overall achieved performance. This is executed by formulating a novel approach that generates a weighted mixture policy from both the optimal policies of the prevailing and forthcoming models. To ensure safe learning, an adaptive learning rate is derived to facilitate training of the weighted mixture policy. This theoretically guarantees monotonic performance improvement at each update during training. Empirical trials focus on a model-free predictive reference tracking scenario involving piecewise constant references. Through the utilization of the cart-pole position control problem, it is demonstrated that the proposed algorithm surpasses prior techniques such as context Q-learning and RL with context detection algorithms in nonstationary environments. Moreover, the algorithm outperforms the application of individual optimal policies derived from each observed environment model (i.e., policies not utilizing predictions).
引用
收藏
页数:16
相关论文
共 50 条
  • [11] P-MARL: Prediction-Based Multi-Agent Reinforcement Learning for Non-Stationary Environments
    Marinescu, Andrei
    Dusparic, Ivana
    Taylor, Adam
    Cahill, Vinny
    Clarke, Siobhan
    PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS (AAMAS'15), 2015, : 1897 - 1898
  • [12] Towards optimal HVAC control in non-stationary building environments combining active change detection and deep reinforcement learning
    Deng, Xiangtian
    Zhang, Yi
    Qi, He
    BUILDING AND ENVIRONMENT, 2022, 211
  • [13] On Optimal Power Control for URLLC over a Non-stationary Wireless Channel using Contextual Reinforcement Learning
    Sharma, Mohit K.
    Sun, Sumei
    Kurniawan, Ernest
    Tan, Peng Hui
    IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 5493 - 5498
  • [14] Bandit Convex Optimization in Non-stationary Environments
    Zhao, Peng
    Wang, Guanghui
    Zhang, Lijun
    Zhou, Zhi-Hua
    JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
  • [15] Homogenization of random diffusions in non-stationary environments
    Boryc, Marcin
    Komorowski, Tomasz
    ASYMPTOTIC ANALYSIS, 2014, 90 (1-2) : 1 - 20
  • [16] A 2-phase prediction of a non-stationary time-series by Taylor series and reinforcement learning
    Dey, Debolina
    Ghosh, Lidia
    Bhattacharya, Diptendu
    Konar, Amit
    APPLIED SOFT COMPUTING, 2023, 145
  • [17] Traffic Scheduling in Non-Stationary Multipath Non-Terrestrial Networks: A Reinforcement Learning Approach
    Machumilane, Achilles
    Gotta, Alberto
    Cassara, Pietro
    Gennaro, Claudio
    Amato, Giuseppe
    ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 4094 - 4099
  • [18] Adaptation method of the exploration ratio based on the orientation of equilibrium in multi-agent reinforcement learning under non-stationary environments
    Okano T.
    Noda I.
    Limited F.
    1600, Fuji Technology Press (21): : 939 - 947
  • [19] Context Detection and Identification In Multi-Agent Reinforcement Learning With Non-Stationary Environment
    Selamet, Ekrem Talha
    Tumer, Borahan
    2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
  • [20] Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems
    Koulouriotis, D. E.
    Xanthopoulos, A.
    APPLIED MATHEMATICS AND COMPUTATION, 2008, 196 (02) : 913 - 922