Model-free robust reinforcement learning via Polynomial Chaos

被引：0

作者：

Liu, Jianxiang ^{[1
,3
,5
]}

Wu, Faguo ^{[1
,3
,4
,5
]}

Zhang, Xiao ^{[2
,3
,4
,5
]}

机构：

[1] Beihang Univ, Inst Artificial Intelligence, Xueyuan Rd 37, Beijing 100191, Peoples R China

[2] Beihang Univ, Sch Math Sci, Xueyuan Rd 37, Beijing 100191, Peoples R China

[3] Beihang Univ, Key Lab Math Informat & Behav Semant LMIB, Beijing 100191, Peoples R China

[4] Zhongguancun Lab, Beijing 100194, Peoples R China

[5] Beijing Adv Innovat Ctr Future Blockchain & Privac, Beijing 100191, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2025年 / 309卷

基金：

中国国家自然科学基金;

关键词：

Robust reinforcement learning; Uncertainty quantification; Function approximation; Generalized Polynomial Chaos; UNCERTAINTY; LEVEL;

D O I：

10.1016/j.knosys.2024.112783

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, the Robust Markov Decision Process (RMDP) has become an important modeling framework to address the discrepancies between simulated and real-world environments in Reinforcement Learning (RL) training. The purpose of RMDP is to accommodate the uncertainty of the real-world environments, employing a conservative approach to enhance the robustness of policy decisions. However, due to the difficulty of robust value function estimation, the RMDP framework is challenging to generalize to environments with large continuous state-action spaces. Our work focuses on model-free robust RL and proposes a model-free algorithm for continuous space setting. We adopt anew perspective on uncertainty sets such that the uncertainty sets are parameterized and the parameters obey specific stochastic distributions. We present a novel approach RPC to estimate the robust value function utilizing generalized Polynomial Chaos(gPC). We provide a proof to guarantee the convergence of the algorithm. Our training framework is based on off-policy RL, which reduces the computation overhead by gPC and improves learning stability. Our algorithm can handle continuous tasks and guarantee the robustness of the algorithm without incurring excessive computational overhead. We combine RPC with the TD3 method and conduct several experiments to evaluate its performance in a continuous robot control task, and the experimental results provide further evidence of the robustness of our algorithm.

引用

页数：11

共 50 条

[1] Model-Free μ Synthesis via Adversarial Reinforcement Learning
Keivan, Darioush
Havens, Aaron
Seiler, Peter
Dullerud, Geir
Hu, Bin
2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 3335 - 3341
[2] Plume Tracing via Model-Free Reinforcement Learning Method
Hu, Hangkai
Song, Shiji
Chen, C. L. Phillip
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (08) : 2515 - 2527
[3] Safe Reinforcement Learning via a Model-Free Safety Certifier
Modares, Amir
Sadati, Nasser
Esmaeili, Babak
Yaghmaie, Farnaz Adib
Modares, Hamidreza
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3302 - 3311
[4] Depth Control of Model-Free AUVs via Reinforcement Learning
Wu, Hui
Song, Shiji
You, Keyou
Wu, Cheng
IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2019, 49 (12): : 2499 - 2510
[5] Robust Model-Free Reinforcement Learning Based Current Control of PMSM Drives
Farah, Nabil
Lei, Gang
Zhu, Jianguo
Guo, Youguang
IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, 2025, 11 (01): : 1061 - 1076
[6] Learning Representations in Model-Free Hierarchical Reinforcement Learning
Rafati, Jacob
Noelle, David C.
THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 10009 - 10010
[7] Model-Free Trajectory Optimization for Reinforcement Learning
Akrour, Riad
Abdolmaleki, Abbas
Abdulsamad, Hany
Neumann, Gerhard
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
[8] Model-Free Active Exploration in Reinforcement Learning
Russo, Alessio
Proutiere, Alexandre
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[9] Model-Free Quantum Control with Reinforcement Learning
Sivak, V. V.
Eickbusch, A.
Liu, H.
Royer, B.
Tsioutsios, I
Devoret, M. H.
PHYSICAL REVIEW X, 2022, 12 (01)
[10] Online Nonstochastic Model-Free Reinforcement Learning
Ghai, Udaya
Gupta, Arushi
Xia, Wenhan
Singh, Karan
Hazan, Elad
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,

← 1 2 3 4 5 →