Model-free robust reinforcement learning via Polynomial Chaos

被引：0

作者：

Liu, Jianxiang ^{[1
,3
,5
]}

Wu, Faguo ^{[1
,3
,4
,5
]}

Zhang, Xiao ^{[2
,3
,4
,5
]}

机构：

[1] Beihang Univ, Inst Artificial Intelligence, Xueyuan Rd 37, Beijing 100191, Peoples R China

[2] Beihang Univ, Sch Math Sci, Xueyuan Rd 37, Beijing 100191, Peoples R China

[3] Beihang Univ, Key Lab Math Informat & Behav Semant LMIB, Beijing 100191, Peoples R China

[4] Zhongguancun Lab, Beijing 100194, Peoples R China

[5] Beijing Adv Innovat Ctr Future Blockchain & Privac, Beijing 100191, Peoples R China

来源：

KNOWLEDGE-BASED SYSTEMS | 2025年 / 309卷

基金：

中国国家自然科学基金;

关键词：

Robust reinforcement learning; Uncertainty quantification; Function approximation; Generalized Polynomial Chaos; UNCERTAINTY; LEVEL;

D O I：

10.1016/j.knosys.2024.112783

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

In recent years, the Robust Markov Decision Process (RMDP) has become an important modeling framework to address the discrepancies between simulated and real-world environments in Reinforcement Learning (RL) training. The purpose of RMDP is to accommodate the uncertainty of the real-world environments, employing a conservative approach to enhance the robustness of policy decisions. However, due to the difficulty of robust value function estimation, the RMDP framework is challenging to generalize to environments with large continuous state-action spaces. Our work focuses on model-free robust RL and proposes a model-free algorithm for continuous space setting. We adopt anew perspective on uncertainty sets such that the uncertainty sets are parameterized and the parameters obey specific stochastic distributions. We present a novel approach RPC to estimate the robust value function utilizing generalized Polynomial Chaos(gPC). We provide a proof to guarantee the convergence of the algorithm. Our training framework is based on off-policy RL, which reduces the computation overhead by gPC and improves learning stability. Our algorithm can handle continuous tasks and guarantee the robustness of the algorithm without incurring excessive computational overhead. We combine RPC with the TD3 method and conduct several experiments to evaluate its performance in a continuous robot control task, and the experimental results provide further evidence of the robustness of our algorithm.

引用

页数：11

共 50 条

[21] An adaptive clustering method for model-free reinforcement learning
Matt, A
Regensburger, G
INMIC 2004: 8TH INTERNATIONAL MULTITOPIC CONFERENCE, PROCEEDINGS, 2004, : 362 - 367
[22] Model-Free Reinforcement Learning for Mean Field Games
Mishra, Rajesh
Vasal, Deepanshu
Vishwanath, Sriram
IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (04): : 2141 - 2151
[23] Counterfactual Credit Assignment in Model-Free Reinforcement Learning
Mesnard, Thomas
Weber, Theophane
Viola, Fabio
Thakoor, Shantanu
Saade, Alaa
Harutyunyan, Anna
Dabney, Will
Stepleton, Tom
Heess, Nicolas
Guez, Arthur
Moulines, Eric
Hutter, Marcus
Buesing, Lars
Munos, Remi
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
[24] Driving in Dense Traffic with Model-Free Reinforcement Learning
Saxena, Dhruv Mauria
Bae, Sangjae
Nakhaei, Alireza
Fujimura, Kikuo
Likhachev, Maxim
2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 5385 - 5392
[25] Model-Free Reinforcement Learning with Continuous Action in Practice
Degris, Thomas
Pilarski, Patrick M.
Sutton, Richard S.
2012 AMERICAN CONTROL CONFERENCE (ACC), 2012, : 2177 - 2182
[26] Covariance matrix adaptation for model-free reinforcement learning
Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct
2013, Lavoisier, 14 rue de Provigny, Cachan Cedex, F-94236, France (27)
[27] Robotic Table Tennis with Model-Free Reinforcement Learning
Gao, Wenbo
Graesser, Laura
Choromanski, Krzysztof
Song, Xingyou
Lazic, Nevena
Sanketi, Pannag
Sindhwani, Vikas
Jaitly, Navdeep
2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5556 - 5563
[28] MODEL-FREE ONLINE REINFORCEMENT LEARNING OF A ROBOTIC MANIPULATOR
Sweafford, Jerry, Jr.
Fahimi, Farbod
MECHATRONIC SYSTEMS AND CONTROL, 2019, 47 (03): : 136 - 143
[29] On Robust Model-Free Reduced-Dimensional Reinforcement Learning Control for Singularly Perturbed Systems
Mukherjee, Sayak
Bai, He
Chakrabortty, Aranya
2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 3914 - 3919
[30] Adaptive Weight Tuning of EWMA Controller via Model-Free Deep Reinforcement Learning
Ma, Zhu
Pan, Tianhong
IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2023, 36 (01) : 91 - 99

← 1 2 3 4 5 →