Model-free robust reinforcement learning via Polynomial Chaos

被引：0

作者：

Liu, Jianxiang ^{[1
,3
,5
]}

Wu, Faguo ^{[1
,3
,4
,5
]}

Zhang, Xiao ^{[2
,3
,4
,5
]}

机构：

[1] Beihang Univ, Inst Artificial Intelligence, Xueyuan Rd 37, Beijing 100191, Peoples R China

[2] Beihang Univ, Sch Math Sci, Xueyuan Rd 37, Beijing 100191, Peoples R China

[3] Beihang Univ, Key Lab Math Informat & Behav Semant LMIB, Beijing 100191, Peoples R China

[4] Zhongguancun Lab, Beijing 100194, Peoples R China

[5] Beijing Adv Innovat Ctr Future Blockchain & Privac, Beijing 100191, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2025年 / 309卷

基金：

中国国家自然科学基金;

关键词：

Robust reinforcement learning; Uncertainty quantification; Function approximation; Generalized Polynomial Chaos; UNCERTAINTY; LEVEL;

D O I：

10.1016/j.knosys.2024.112783

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, the Robust Markov Decision Process (RMDP) has become an important modeling framework to address the discrepancies between simulated and real-world environments in Reinforcement Learning (RL) training. The purpose of RMDP is to accommodate the uncertainty of the real-world environments, employing a conservative approach to enhance the robustness of policy decisions. However, due to the difficulty of robust value function estimation, the RMDP framework is challenging to generalize to environments with large continuous state-action spaces. Our work focuses on model-free robust RL and proposes a model-free algorithm for continuous space setting. We adopt anew perspective on uncertainty sets such that the uncertainty sets are parameterized and the parameters obey specific stochastic distributions. We present a novel approach RPC to estimate the robust value function utilizing generalized Polynomial Chaos(gPC). We provide a proof to guarantee the convergence of the algorithm. Our training framework is based on off-policy RL, which reduces the computation overhead by gPC and improves learning stability. Our algorithm can handle continuous tasks and guarantee the robustness of the algorithm without incurring excessive computational overhead. We combine RPC with the TD3 method and conduct several experiments to evaluate its performance in a continuous robot control task, and the experimental results provide further evidence of the robustness of our algorithm.

引用

页数：11

共 50 条

[41] On the importance of hyperparameters tuning for model-free reinforcement learning algorithms
Tejer, Mateusz
Szezepanski, Rafal
2024 12TH INTERNATIONAL CONFERENCE ON CONTROL, MECHATRONICS AND AUTOMATION, ICCMA, 2024, : 78 - 82
[42] Model-free Reinforcement Learning for Stochastic Stackelberg Security Games
Mishra, Rajesh K.
Vasal, Deepanshu
Vishwanath, Sriram
2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 348 - 353
[43] Model-Free Emergency Frequency Control Based on Reinforcement Learning
Chen, Chunyu
Cui, Mingjian
Li, Fangxing
Yin, Shengfei
Wang, Xinan
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (04) : 2336 - 2346
[44] Model-Free Reinforcement Learning for Branching Markov Decision Processes
Hahn, Ernst Moritz
Perez, Mateo
Schewe, Sven
Somenzi, Fabio
Trivedi, Ashutosh
Wojtczak, Dominik
COMPUTER AIDED VERIFICATION, PT II, CAV 2021, 2021, 12760 : 651 - 673
[45] Model-free reinforcement learning from expert demonstrations: a survey
Jorge Ramírez
Wen Yu
Adolfo Perrusquía
Artificial Intelligence Review, 2022, 55 : 3213 - 3241
[46] Model-Free Deep Inverse Reinforcement Learning by Logistic Regression
Eiji Uchibe
Neural Processing Letters, 2018, 47 : 891 - 905
[47] Mastering the game of Stratego with model-free multiagent reinforcement learning
Perolat, Julien
De Vylder, Bart
Hennes, Daniel
Tarassov, Eugene
Strub, Florian
de Boer, Vincent
Muller, Paul
Connor, Jerome T.
Burch, Neil
Anthony, Thomas
McAleer, Stephen
Elie, Romuald
Cen, Sarah H.
Wang, Zhe
Gruslys, Audrunas
Malysheva, Aleksandra
Khan, Mina
Ozair, Sherjil
Timbers, Finbarr
Pohlen, Toby
Eccles, Tom
Rowland, Mark
Lanctot, Marc
Lespiau, Jean-Baptiste
Piot, Bilal
Omidshafiei, Shayegan
Lockhart, Edward
Sifre, Laurent
Beauguerlange, Nathalie
Munos, Remi
Silver, David
Singh, Satinder
Hassabis, Demis
Tuyls, Karl
SCIENCE, 2022, 378 (6623) : 990 - +
[48] Model-free reinforcement learning from expert demonstrations: a survey
Ramirez, Jorge
Yu, Wen
Perrusquia, Adolfo
ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 3213 - 3241
[49] On Distributed Model-Free Reinforcement Learning Control with Stability Guarantee
Mukherjee, Sayak
Thanh Long Vu
2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 2175 - 2180
[50] Non-stationarity Detection in Model-Free Reinforcement Learning via Value Function Monitoring
Hussein, Maryem
Keshk, Marwa
Hussein, Aya
ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II, 2024, 14472 : 350 - 362

← 1 2 3 4 5 →