Predictive reinforcement learning in non-stationary environments using weighted mixture policy

被引：0

作者：

Pourshamsaei, Hossein ^{[1
]}

Nobakhti, Amin ^{[1
]}

机构：

[1] Sharif Univ Technol, Dept Elect Engn, Azadi Ave, Tehran 111554363, Iran

来源：

APPLIED SOFT COMPUTING | 2024年 / 153卷

关键词：

Reinforcement learning; Non-stationary environments; Adaptive learning rate; Mixture policy; Predictive reference tracking; MODEL;

D O I：

10.1016/j.asoc.2024.111305

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement Learning (RL) within non-stationary environments presents a formidable challenge. In some applications, anticipating abrupt alterations in the environment model might be possible. The existing literature lacks a framework that proactively harnesses such predictions to enhance reward optimization. This paper introduces an innovative methodology designed to preemptively leverage these predictions, thereby maximizing the overall achieved performance. This is executed by formulating a novel approach that generates a weighted mixture policy from both the optimal policies of the prevailing and forthcoming models. To ensure safe learning, an adaptive learning rate is derived to facilitate training of the weighted mixture policy. This theoretically guarantees monotonic performance improvement at each update during training. Empirical trials focus on a model-free predictive reference tracking scenario involving piecewise constant references. Through the utilization of the cart-pole position control problem, it is demonstrated that the proposed algorithm surpasses prior techniques such as context Q-learning and RL with context detection algorithms in nonstationary environments. Moreover, the algorithm outperforms the application of individual optimal policies derived from each observed environment model (i.e., policies not utilizing predictions).

引用

页数：16

共 50 条

[41] Mixed Reinforcement Learning for Efficient Policy Optimization in Stochastic Environments
Mu, Yao
Peng, Baiyu
Gu, Ziqing
Li, Shengbo Eben
Liu, Chang
Nie, Bingbing
Zheng, Jianfeng
Zhang, Bo
2020 20TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND SYSTEMS (ICCAS), 2020, : 1212 - 1219
[42] Optimal policy for a dynamic, non-stationary, stochastic inventory problem with capacity commitment
Xu, Ningxiong
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2009, 199 (02) : 400 - 408
[43] LLM-Informed Multi-Armed Bandit Strategies for Non-Stationary Environments
de Curto, J.
de Zarza, I.
Roig, Gemma
Cano, Juan Carlos
Manzoni, Pietro
Calafate, Carlos T.
ELECTRONICS, 2023, 12 (13)
[44] Statistics of extreme ocean environments: Non-stationary inference for directionality and other covariate effects
Jones, Matthew
Randell, David
Ewans, Kevin
Jonathan, Philip
OCEAN ENGINEERING, 2016, 119 : 30 - 46
[45] Predictive Learning Model in Cognitive Radio using Reinforcement Learning
Tubachi, Sharada
Venkatesan, Mithra
Kulkarni, A., V
2017 IEEE INTERNATIONAL CONFERENCE ON POWER, CONTROL, SIGNALS AND INSTRUMENTATION ENGINEERING (ICPCSI), 2017, : 564 - 567
[46] Non-stationary time-varying vehicular channel characteristics for different roadside scattering environments
Li, Changzhen
Chen, Wei
Pei, Zhonghui
Chang, Fuxing
Yu, Junyi
Luo, Fan
SCIENTIFIC REPORTS, 2022, 12 (01)
[47] Density-based Core Support Extraction for Non-stationary Environments with Extreme Verification Latency
Ferreira, Raul S.
da Silva, Bruno M. A.
Teixeira, Wendell W.
Zimbrao, Geraldo
Alvim, Leandro
2018 7TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2018, : 181 - 187
[48] Power Laws Derived from a Bayesian Decision-Making Model in Non-Stationary Environments
Shinohara, Shuji
Manome, Nobuhito
Nakajima, Yoshihiro
Gunji, Yukio Pegio
Moriyama, Toru
Okamoto, Hiroshi
Mitsuyoshi, Shunji
Chung, Ung-il
SYMMETRY-BASEL, 2021, 13 (04):
[49] An experimental review of the ensemble-based data stream classification algorithms in non-stationary environments
Khezri, Shirin
Tanha, Jafar
Samadi, Negin
COMPUTERS & ELECTRICAL ENGINEERING, 2024, 118
[50] Object Manipulation in Marine Environments using Reinforcement Learning
Nader, Ahmed
Din, Muhayy Ud
Irfan, Mughni
Hussain, Irfan
IFAC PAPERSONLINE, 2024, 58 (20): : 215 - 222

← 1 2 3 4 5 →