Model-free robust reinforcement learning via Polynomial Chaos

被引:0
|
作者
Liu, Jianxiang [1 ,3 ,5 ]
Wu, Faguo [1 ,3 ,4 ,5 ]
Zhang, Xiao [2 ,3 ,4 ,5 ]
机构
[1] Beihang Univ, Inst Artificial Intelligence, Xueyuan Rd 37, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Math Sci, Xueyuan Rd 37, Beijing 100191, Peoples R China
[3] Beihang Univ, Key Lab Math Informat & Behav Semant LMIB, Beijing 100191, Peoples R China
[4] Zhongguancun Lab, Beijing 100194, Peoples R China
[5] Beijing Adv Innovat Ctr Future Blockchain & Privac, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Robust reinforcement learning; Uncertainty quantification; Function approximation; Generalized Polynomial Chaos; UNCERTAINTY; LEVEL;
D O I
10.1016/j.knosys.2024.112783
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, the Robust Markov Decision Process (RMDP) has become an important modeling framework to address the discrepancies between simulated and real-world environments in Reinforcement Learning (RL) training. The purpose of RMDP is to accommodate the uncertainty of the real-world environments, employing a conservative approach to enhance the robustness of policy decisions. However, due to the difficulty of robust value function estimation, the RMDP framework is challenging to generalize to environments with large continuous state-action spaces. Our work focuses on model-free robust RL and proposes a model-free algorithm for continuous space setting. We adopt anew perspective on uncertainty sets such that the uncertainty sets are parameterized and the parameters obey specific stochastic distributions. We present a novel approach RPC to estimate the robust value function utilizing generalized Polynomial Chaos(gPC). We provide a proof to guarantee the convergence of the algorithm. Our training framework is based on off-policy RL, which reduces the computation overhead by gPC and improves learning stability. Our algorithm can handle continuous tasks and guarantee the robustness of the algorithm without incurring excessive computational overhead. We combine RPC with the TD3 method and conduct several experiments to evaluate its performance in a continuous robot control task, and the experimental results provide further evidence of the robustness of our algorithm.
引用
收藏
页数:11
相关论文
共 50 条
  • [41] On the importance of hyperparameters tuning for model-free reinforcement learning algorithms
    Tejer, Mateusz
    Szezepanski, Rafal
    2024 12TH INTERNATIONAL CONFERENCE ON CONTROL, MECHATRONICS AND AUTOMATION, ICCMA, 2024, : 78 - 82
  • [42] Model-free Reinforcement Learning for Stochastic Stackelberg Security Games
    Mishra, Rajesh K.
    Vasal, Deepanshu
    Vishwanath, Sriram
    2020 59TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2020, : 348 - 353
  • [43] Model-Free Emergency Frequency Control Based on Reinforcement Learning
    Chen, Chunyu
    Cui, Mingjian
    Li, Fangxing
    Yin, Shengfei
    Wang, Xinan
    IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, 2021, 17 (04) : 2336 - 2346
  • [44] Model-Free Reinforcement Learning for Branching Markov Decision Processes
    Hahn, Ernst Moritz
    Perez, Mateo
    Schewe, Sven
    Somenzi, Fabio
    Trivedi, Ashutosh
    Wojtczak, Dominik
    COMPUTER AIDED VERIFICATION, PT II, CAV 2021, 2021, 12760 : 651 - 673
  • [45] Model-free reinforcement learning from expert demonstrations: a survey
    Jorge Ramírez
    Wen Yu
    Adolfo Perrusquía
    Artificial Intelligence Review, 2022, 55 : 3213 - 3241
  • [46] Model-Free Deep Inverse Reinforcement Learning by Logistic Regression
    Eiji Uchibe
    Neural Processing Letters, 2018, 47 : 891 - 905
  • [47] Mastering the game of Stratego with model-free multiagent reinforcement learning
    Perolat, Julien
    De Vylder, Bart
    Hennes, Daniel
    Tarassov, Eugene
    Strub, Florian
    de Boer, Vincent
    Muller, Paul
    Connor, Jerome T.
    Burch, Neil
    Anthony, Thomas
    McAleer, Stephen
    Elie, Romuald
    Cen, Sarah H.
    Wang, Zhe
    Gruslys, Audrunas
    Malysheva, Aleksandra
    Khan, Mina
    Ozair, Sherjil
    Timbers, Finbarr
    Pohlen, Toby
    Eccles, Tom
    Rowland, Mark
    Lanctot, Marc
    Lespiau, Jean-Baptiste
    Piot, Bilal
    Omidshafiei, Shayegan
    Lockhart, Edward
    Sifre, Laurent
    Beauguerlange, Nathalie
    Munos, Remi
    Silver, David
    Singh, Satinder
    Hassabis, Demis
    Tuyls, Karl
    SCIENCE, 2022, 378 (6623) : 990 - +
  • [48] Model-free reinforcement learning from expert demonstrations: a survey
    Ramirez, Jorge
    Yu, Wen
    Perrusquia, Adolfo
    ARTIFICIAL INTELLIGENCE REVIEW, 2022, 55 (04) : 3213 - 3241
  • [49] On Distributed Model-Free Reinforcement Learning Control with Stability Guarantee
    Mukherjee, Sayak
    Thanh Long Vu
    2021 AMERICAN CONTROL CONFERENCE (ACC), 2021, : 2175 - 2180
  • [50] Non-stationarity Detection in Model-Free Reinforcement Learning via Value Function Monitoring
    Hussein, Maryem
    Keshk, Marwa
    Hussein, Aya
    ADVANCES IN ARTIFICIAL INTELLIGENCE, AI 2023, PT II, 2024, 14472 : 350 - 362