Predictive reinforcement learning in non-stationary environments using weighted mixture policy

被引：0

作者：

Pourshamsaei, Hossein ^{[1
]}

Nobakhti, Amin ^{[1
]}

机构：

[1] Sharif Univ Technol, Dept Elect Engn, Azadi Ave, Tehran 111554363, Iran

来源：

APPLIED SOFT COMPUTING | 2024年 / 153卷

关键词：

Reinforcement learning; Non-stationary environments; Adaptive learning rate; Mixture policy; Predictive reference tracking; MODEL;

D O I：

10.1016/j.asoc.2024.111305

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement Learning (RL) within non-stationary environments presents a formidable challenge. In some applications, anticipating abrupt alterations in the environment model might be possible. The existing literature lacks a framework that proactively harnesses such predictions to enhance reward optimization. This paper introduces an innovative methodology designed to preemptively leverage these predictions, thereby maximizing the overall achieved performance. This is executed by formulating a novel approach that generates a weighted mixture policy from both the optimal policies of the prevailing and forthcoming models. To ensure safe learning, an adaptive learning rate is derived to facilitate training of the weighted mixture policy. This theoretically guarantees monotonic performance improvement at each update during training. Empirical trials focus on a model-free predictive reference tracking scenario involving piecewise constant references. Through the utilization of the cart-pole position control problem, it is demonstrated that the proposed algorithm surpasses prior techniques such as context Q-learning and RL with context detection algorithms in nonstationary environments. Moreover, the algorithm outperforms the application of individual optimal policies derived from each observed environment model (i.e., policies not utilizing predictions).

引用

页数：16

共 50 条

[11] P-MARL: Prediction-Based Multi-Agent Reinforcement Learning for Non-Stationary Environments
Marinescu, Andrei
Dusparic, Ivana
Taylor, Adam
Cahill, Vinny
Clarke, Siobhan
PROCEEDINGS OF THE 2015 INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS & MULTIAGENT SYSTEMS (AAMAS'15), 2015, : 1897 - 1898
[12] Towards optimal HVAC control in non-stationary building environments combining active change detection and deep reinforcement learning
Deng, Xiangtian
Zhang, Yi
Qi, He
BUILDING AND ENVIRONMENT, 2022, 211
[13] On Optimal Power Control for URLLC over a Non-stationary Wireless Channel using Contextual Reinforcement Learning
Sharma, Mohit K.
Sun, Sumei
Kurniawan, Ernest
Tan, Peng Hui
IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS (ICC 2022), 2022, : 5493 - 5498
[14] Bandit Convex Optimization in Non-stationary Environments
Zhao, Peng
Wang, Guanghui
Zhang, Lijun
Zhou, Zhi-Hua
JOURNAL OF MACHINE LEARNING RESEARCH, 2021, 22
[15] Homogenization of random diffusions in non-stationary environments
Boryc, Marcin
Komorowski, Tomasz
ASYMPTOTIC ANALYSIS, 2014, 90 (1-2) : 1 - 20
[16] A 2-phase prediction of a non-stationary time-series by Taylor series and reinforcement learning
Dey, Debolina
Ghosh, Lidia
Bhattacharya, Diptendu
Konar, Amit
APPLIED SOFT COMPUTING, 2023, 145
[17] Traffic Scheduling in Non-Stationary Multipath Non-Terrestrial Networks: A Reinforcement Learning Approach
Machumilane, Achilles
Gotta, Alberto
Cassara, Pietro
Gennaro, Claudio
Amato, Giuseppe
ICC 2023-IEEE INTERNATIONAL CONFERENCE ON COMMUNICATIONS, 2023, : 4094 - 4099
[18] Adaptation method of the exploration ratio based on the orientation of equilibrium in multi-agent reinforcement learning under non-stationary environments
Okano T.
Noda I.
Limited F.
1600, Fuji Technology Press (21): : 939 - 947
[19] Context Detection and Identification In Multi-Agent Reinforcement Learning With Non-Stationary Environment
Selamet, Ekrem Talha
Tumer, Borahan
2022 30TH SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE, SIU, 2022,
[20] Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems
Koulouriotis, D. E.
Xanthopoulos, A.
APPLIED MATHEMATICS AND COMPUTATION, 2008, 196 (02) : 913 - 922

← 1 2 3 4 5 →