Model-free robust reinforcement learning via Polynomial Chaos

被引:0
|
作者
Liu, Jianxiang [1 ,3 ,5 ]
Wu, Faguo [1 ,3 ,4 ,5 ]
Zhang, Xiao [2 ,3 ,4 ,5 ]
机构
[1] Beihang Univ, Inst Artificial Intelligence, Xueyuan Rd 37, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Math Sci, Xueyuan Rd 37, Beijing 100191, Peoples R China
[3] Beihang Univ, Key Lab Math Informat & Behav Semant LMIB, Beijing 100191, Peoples R China
[4] Zhongguancun Lab, Beijing 100194, Peoples R China
[5] Beijing Adv Innovat Ctr Future Blockchain & Privac, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Robust reinforcement learning; Uncertainty quantification; Function approximation; Generalized Polynomial Chaos; UNCERTAINTY; LEVEL;
D O I
10.1016/j.knosys.2024.112783
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, the Robust Markov Decision Process (RMDP) has become an important modeling framework to address the discrepancies between simulated and real-world environments in Reinforcement Learning (RL) training. The purpose of RMDP is to accommodate the uncertainty of the real-world environments, employing a conservative approach to enhance the robustness of policy decisions. However, due to the difficulty of robust value function estimation, the RMDP framework is challenging to generalize to environments with large continuous state-action spaces. Our work focuses on model-free robust RL and proposes a model-free algorithm for continuous space setting. We adopt anew perspective on uncertainty sets such that the uncertainty sets are parameterized and the parameters obey specific stochastic distributions. We present a novel approach RPC to estimate the robust value function utilizing generalized Polynomial Chaos(gPC). We provide a proof to guarantee the convergence of the algorithm. Our training framework is based on off-policy RL, which reduces the computation overhead by gPC and improves learning stability. Our algorithm can handle continuous tasks and guarantee the robustness of the algorithm without incurring excessive computational overhead. We combine RPC with the TD3 method and conduct several experiments to evaluate its performance in a continuous robot control task, and the experimental results provide further evidence of the robustness of our algorithm.
引用
收藏
页数:11
相关论文
共 50 条
  • [1] Model-Free μ Synthesis via Adversarial Reinforcement Learning
    Keivan, Darioush
    Havens, Aaron
    Seiler, Peter
    Dullerud, Geir
    Hu, Bin
    2022 AMERICAN CONTROL CONFERENCE, ACC, 2022, : 3335 - 3341
  • [2] Plume Tracing via Model-Free Reinforcement Learning Method
    Hu, Hangkai
    Song, Shiji
    Chen, C. L. Phillip
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (08) : 2515 - 2527
  • [3] Safe Reinforcement Learning via a Model-Free Safety Certifier
    Modares, Amir
    Sadati, Nasser
    Esmaeili, Babak
    Yaghmaie, Farnaz Adib
    Modares, Hamidreza
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, 35 (03) : 3302 - 3311
  • [4] Depth Control of Model-Free AUVs via Reinforcement Learning
    Wu, Hui
    Song, Shiji
    You, Keyou
    Wu, Cheng
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2019, 49 (12): : 2499 - 2510
  • [5] Robust Model-Free Reinforcement Learning Based Current Control of PMSM Drives
    Farah, Nabil
    Lei, Gang
    Zhu, Jianguo
    Guo, Youguang
    IEEE TRANSACTIONS ON TRANSPORTATION ELECTRIFICATION, 2025, 11 (01): : 1061 - 1076
  • [6] Learning Representations in Model-Free Hierarchical Reinforcement Learning
    Rafati, Jacob
    Noelle, David C.
    THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, : 10009 - 10010
  • [7] Model-Free Trajectory Optimization for Reinforcement Learning
    Akrour, Riad
    Abdolmaleki, Abbas
    Abdulsamad, Hany
    Neumann, Gerhard
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 48, 2016, 48
  • [8] Model-Free Active Exploration in Reinforcement Learning
    Russo, Alessio
    Proutiere, Alexandre
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
  • [9] Model-Free Quantum Control with Reinforcement Learning
    Sivak, V. V.
    Eickbusch, A.
    Liu, H.
    Royer, B.
    Tsioutsios, I
    Devoret, M. H.
    PHYSICAL REVIEW X, 2022, 12 (01)
  • [10] Online Nonstochastic Model-Free Reinforcement Learning
    Ghai, Udaya
    Gupta, Arushi
    Xia, Wenhan
    Singh, Karan
    Hazan, Elad
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,