Model-free robust reinforcement learning via Polynomial Chaos

被引:0
|
作者
Liu, Jianxiang [1 ,3 ,5 ]
Wu, Faguo [1 ,3 ,4 ,5 ]
Zhang, Xiao [2 ,3 ,4 ,5 ]
机构
[1] Beihang Univ, Inst Artificial Intelligence, Xueyuan Rd 37, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Math Sci, Xueyuan Rd 37, Beijing 100191, Peoples R China
[3] Beihang Univ, Key Lab Math Informat & Behav Semant LMIB, Beijing 100191, Peoples R China
[4] Zhongguancun Lab, Beijing 100194, Peoples R China
[5] Beijing Adv Innovat Ctr Future Blockchain & Privac, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Robust reinforcement learning; Uncertainty quantification; Function approximation; Generalized Polynomial Chaos; UNCERTAINTY; LEVEL;
D O I
10.1016/j.knosys.2024.112783
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, the Robust Markov Decision Process (RMDP) has become an important modeling framework to address the discrepancies between simulated and real-world environments in Reinforcement Learning (RL) training. The purpose of RMDP is to accommodate the uncertainty of the real-world environments, employing a conservative approach to enhance the robustness of policy decisions. However, due to the difficulty of robust value function estimation, the RMDP framework is challenging to generalize to environments with large continuous state-action spaces. Our work focuses on model-free robust RL and proposes a model-free algorithm for continuous space setting. We adopt anew perspective on uncertainty sets such that the uncertainty sets are parameterized and the parameters obey specific stochastic distributions. We present a novel approach RPC to estimate the robust value function utilizing generalized Polynomial Chaos(gPC). We provide a proof to guarantee the convergence of the algorithm. Our training framework is based on off-policy RL, which reduces the computation overhead by gPC and improves learning stability. Our algorithm can handle continuous tasks and guarantee the robustness of the algorithm without incurring excessive computational overhead. We combine RPC with the TD3 method and conduct several experiments to evaluate its performance in a continuous robot control task, and the experimental results provide further evidence of the robustness of our algorithm.
引用
收藏
页数:11
相关论文
共 50 条
  • [31] Model-free learning control of neutralization processes using reinforcement learning
    Syafiie, S.
    Tadeo, F.
    Martinez, E.
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2007, 20 (06) : 767 - 782
  • [32] Model-based and Model-free Reinforcement Learning for Visual Servoing
    Farahmand, Amir Massoud
    Shademan, Azad
    Jagersand, Martin
    Szepesvari, Csaba
    ICRA: 2009 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION, VOLS 1-7, 2009, : 4135 - 4142
  • [33] Linear Quadratic Control Using Model-Free Reinforcement Learning
    Yaghmaie, Farnaz Adib
    Gustafsson, Fredrik
    Ljung, Lennart
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (02) : 737 - 752
  • [34] On Distributed Model-Free Reinforcement Learning Control With Stability Guarantee
    Mukherjee, Sayak
    Vu, Thanh Long
    IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (05): : 1615 - 1620
  • [35] Model-Free Reinforcement Learning of Impedance Control in Stochastic Environments
    Stulp, Freek
    Buchli, Jonas
    Ellmer, Alice
    Mistry, Michael
    Theodorou, Evangelos A.
    Schaal, Stefan
    IEEE TRANSACTIONS ON AUTONOMOUS MENTAL DEVELOPMENT, 2012, 4 (04) : 330 - 341
  • [36] Model-Free Recurrent Reinforcement Learning for AUV Horizontal Control
    Huo, Yujia
    Li, Yiping
    Feng, Xisheng
    3RD INTERNATIONAL CONFERENCE ON AUTOMATION, CONTROL AND ROBOTICS ENGINEERING (CACRE 2018), 2018, 428
  • [37] Model-Free Linear Noncausal Optimal Control of Wave Energy Converters via Reinforcement Learning
    Zhan, Siyuan
    Ringwood, John V.
    IEEE TRANSACTIONS ON CONTROL SYSTEMS TECHNOLOGY, 2024, 32 (06) : 2164 - 2177
  • [38] Limit Reachability for Model-Free Reinforcement Learning of ω-Regular Objectives
    Hahn, Ernst Moritz
    Perez, Mateo
    Schewe, Sven
    Somenzi, Fabio
    Trivedi, Ashutosh
    Wojtczak, Dominik
    PROCEEDINGS OF THE 5TH INTERNATIONAL WORKSHOP ON SYMBOLIC-NUMERIC METHODS FOR REASONING ABOUT CPS AND IOT (SNR 2019), 2019, : 16 - 18
  • [39] Model-Free Control for Soft Manipulators based on Reinforcement Learning
    You, Xuanke
    Zhang, Yixiao
    Chen, Xiaotong
    Liu, Xinghua
    Wang, Zhanchi
    Jiang, Hao
    Chen, Xiaoping
    2017 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2017, : 2909 - 2915
  • [40] Model-Free Reinforcement Learning with the Decision-Estimation Coefficient
    Foster, Dylan J.
    Golowich, Noah
    Qian, Jian
    Rakhlin, Alexander
    Sekhari, Ayush
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,