Predictive reinforcement learning in non-stationary environments using weighted mixture policy

被引：0

作者：

Pourshamsaei, Hossein ^{[1
]}

Nobakhti, Amin ^{[1
]}

机构：

[1] Sharif Univ Technol, Dept Elect Engn, Azadi Ave, Tehran 111554363, Iran

来源：

APPLIED SOFT COMPUTING | 2024年 / 153卷

关键词：

Reinforcement learning; Non-stationary environments; Adaptive learning rate; Mixture policy; Predictive reference tracking; MODEL;

D O I：

10.1016/j.asoc.2024.111305

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement Learning (RL) within non-stationary environments presents a formidable challenge. In some applications, anticipating abrupt alterations in the environment model might be possible. The existing literature lacks a framework that proactively harnesses such predictions to enhance reward optimization. This paper introduces an innovative methodology designed to preemptively leverage these predictions, thereby maximizing the overall achieved performance. This is executed by formulating a novel approach that generates a weighted mixture policy from both the optimal policies of the prevailing and forthcoming models. To ensure safe learning, an adaptive learning rate is derived to facilitate training of the weighted mixture policy. This theoretically guarantees monotonic performance improvement at each update during training. Empirical trials focus on a model-free predictive reference tracking scenario involving piecewise constant references. Through the utilization of the cart-pole position control problem, it is demonstrated that the proposed algorithm surpasses prior techniques such as context Q-learning and RL with context detection algorithms in nonstationary environments. Moreover, the algorithm outperforms the application of individual optimal policies derived from each observed environment model (i.e., policies not utilizing predictions).

引用

页数：16

共 50 条

[21] Real-time dynamic pricing in a non-stationary environment using model-free reinforcement learning
Rana, Rupal
Oliveira, Fernando S.
OMEGA-INTERNATIONAL JOURNAL OF MANAGEMENT SCIENCE, 2014, 47 : 116 - 126
[22] Accelerated Variant of Reinforcement Learning Algorithms for Light Control with Non-stationary User Behaviour
Haddam, Nassim
Boulakia, Benjamin Cohen
Barth, Dominique
PROCEEDINGS OF THE 11TH INTERNATIONAL CONFERENCE ON SMART CITIES AND GREEN ICT SYSTEMS (SMARTGREENS), 2022, : 78 - 85
[23] Fundamental Limits of Age-of-Information in Stationary and Non-stationary Environments
Banerjee, Subhankar
Bhattacharjee, Rajarshi
Sinha, Abhishek
2020 IEEE INTERNATIONAL SYMPOSIUM ON INFORMATION THEORY (ISIT), 2020, : 1741 - 1746
[24] Multi-Agent Combat in Non-Stationary Environments
Li, Shengang
Chi, Haoang
Xie, Tao
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
[25] Online Learning for Non-Stationary A/B Tests
Medina, Andres Munoz
Vassilvitiskii, Sergei
Yin, Dong
CIKM'18: PROCEEDINGS OF THE 27TH ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, 2018, : 317 - 326
[26] A Policy Search and Transfer Approach in the Non-stationary Environment
Zhu F.
Liu Q.
Fu Q.-M.
Chen D.-H.
Wang H.
Fu Y.-C.
Tien Tzu Hsueh Pao/Acta Electronica Sinica, 2017, 45 (02): : 257 - 266
[27] An RBF online learning scheme for non-stationary environments based on fuzzy means and Givens rotations
Karamichailidou, Despina
Koletsios, Sotirios
Alexandridis, Alex
NEUROCOMPUTING, 2022, 501 : 370 - 386
[28] A Model-free Reinforcement Learning Approach for the Energetic Control of a Building with Non-stationary User Behaviour
Haddam, Nassim
Boulakia, Benjamin Cohen
Barth, Dominique
2020 THE 4TH INTERNATIONAL CONFERENCE ON SMART GRID AND SMART CITIES (ICSGSC 2020), 2020, : 168 - 177
[29] Stochastic discretized learning-based weak estimation: a novel estimation method for non-stationary environments
Yazidi, Anis
Oommen, B. John
Horn, Geir
Granma, Ole-Christoffer
PATTERN RECOGNITION, 2016, 60 : 430 - 443
[30] Non-stationary Dueling Bandits for Online Learning to Rank
Lu, Shiyin
Miao, Yuan
Yang, Ping
Hu, Yao
Zhang, Lijun
WEB AND BIG DATA, PT II, APWEB-WAIM 2022, 2023, 13422 : 166 - 174

← 1 2 3 4 5 →