Model-free robust reinforcement learning via Polynomial Chaos

被引：0

作者：

Liu, Jianxiang ^{[1
,3
,5
]}

Wu, Faguo ^{[1
,3
,4
,5
]}

Zhang, Xiao ^{[2
,3
,4
,5
]}

机构：

[1] Beihang Univ, Inst Artificial Intelligence, Xueyuan Rd 37, Beijing 100191, Peoples R China

[2] Beihang Univ, Sch Math Sci, Xueyuan Rd 37, Beijing 100191, Peoples R China

[3] Beihang Univ, Key Lab Math Informat & Behav Semant LMIB, Beijing 100191, Peoples R China

[4] Zhongguancun Lab, Beijing 100194, Peoples R China

[5] Beijing Adv Innovat Ctr Future Blockchain & Privac, Beijing 100191, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2025年 / 309卷

基金：

中国国家自然科学基金;

关键词：

Robust reinforcement learning; Uncertainty quantification; Function approximation; Generalized Polynomial Chaos; UNCERTAINTY; LEVEL;

D O I：

10.1016/j.knosys.2024.112783

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, the Robust Markov Decision Process (RMDP) has become an important modeling framework to address the discrepancies between simulated and real-world environments in Reinforcement Learning (RL) training. The purpose of RMDP is to accommodate the uncertainty of the real-world environments, employing a conservative approach to enhance the robustness of policy decisions. However, due to the difficulty of robust value function estimation, the RMDP framework is challenging to generalize to environments with large continuous state-action spaces. Our work focuses on model-free robust RL and proposes a model-free algorithm for continuous space setting. We adopt anew perspective on uncertainty sets such that the uncertainty sets are parameterized and the parameters obey specific stochastic distributions. We present a novel approach RPC to estimate the robust value function utilizing generalized Polynomial Chaos(gPC). We provide a proof to guarantee the convergence of the algorithm. Our training framework is based on off-policy RL, which reduces the computation overhead by gPC and improves learning stability. Our algorithm can handle continuous tasks and guarantee the robustness of the algorithm without incurring excessive computational overhead. We combine RPC with the TD3 method and conduct several experiments to evaluate its performance in a continuous robot control task, and the experimental results provide further evidence of the robustness of our algorithm.

引用

页数：11

共 50 条

[31] Model-free learning control of neutralization processes using reinforcement learning
Syafiie, S.
Tadeo, F.
Martinez, E.
ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2007, 20 (06) : 767 - 782
[32] Model-based and Model-free Reinforcement Learning for Visual Servoing
Farahmand, Amir Massoud
Shademan, Azad
Jagersand, Martin
Szepesvari, Csaba
ICRA: 2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-7, 2009, : 4135 - 4142
[33] Linear Quadratic Control Using Model-Free Reinforcement Learning
Yaghmaie, Farnaz Adib
Gustafsson, Fredrik
Ljung, Lennart
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (02) : 737 - 752
[34] On Distributed Model-Free Reinforcement Learning Control With Stability Guarantee
Mukherjee, Sayak
Vu, Thanh Long
IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (05): : 1615 - 1620
[35] Model-Free Reinforcement Learning of Impedance Control in Stochastic Environments
Stulp, Freek
Buchli, Jonas
Ellmer, Alice
Mistry, Michael
Theodorou, Evangelos A.
Schaal, Stefan
IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, 2012, 4 (04) : 330 - 341
[36] Model-Free Recurrent Reinforcement Learning for AUV Horizontal Control
Huo, Yujia
Li, Yiping
Feng, Xisheng
3RD INTERNATIONAL CONFERENCE ON AUTOMATION, CONTROL AND ROBOTICS ENGINEERING (CACRE 2018), 2018, 428
[37] Model-Free Linear Noncausal Optimal Control of Wave Energy Converters via Reinforcement Learning
Zhan, Siyuan
Ringwood, John V.
IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2024, 32 (06) : 2164 - 2177
[38] Limit Reachability for Model-Free Reinforcement Learning of ω-Regular Objectives
Hahn, Ernst Moritz
Perez, Mateo
Schewe, Sven
Somenzi, Fabio
Trivedi, Ashutosh
Wojtczak, Dominik
PROCEEDINGS OF THE 5TH INTERNATIONAL WORKSHOP ON SYMBOLIC-NUMERIC METHODS FOR REASONING ABOUT CPS AND IOT (SNR 2019), 2019, : 16 - 18
[39] Model-Free Control for Soft Manipulators based on Reinforcement Learning
You, Xuanke
Zhang, Yixiao
Chen, Xiaotong
Liu, Xinghua
Wang, Zhanchi
Jiang, Hao
Chen, Xiaoping
2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 2909 - 2915
[40] Model-Free Reinforcement Learning with the Decision-Estimation Coefficient
Foster, Dylan J.
Golowich, Noah
Qian, Jian
Rakhlin, Alexander
Sekhari, Ayush
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →