Model-free robust reinforcement learning via Polynomial Chaos

被引:0
|
作者
Liu, Jianxiang [1 ,3 ,5 ]
Wu, Faguo [1 ,3 ,4 ,5 ]
Zhang, Xiao [2 ,3 ,4 ,5 ]
机构
[1] Beihang Univ, Inst Artificial Intelligence, Xueyuan Rd 37, Beijing 100191, Peoples R China
[2] Beihang Univ, Sch Math Sci, Xueyuan Rd 37, Beijing 100191, Peoples R China
[3] Beihang Univ, Key Lab Math Informat & Behav Semant LMIB, Beijing 100191, Peoples R China
[4] Zhongguancun Lab, Beijing 100194, Peoples R China
[5] Beijing Adv Innovat Ctr Future Blockchain & Privac, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Robust reinforcement learning; Uncertainty quantification; Function approximation; Generalized Polynomial Chaos; UNCERTAINTY; LEVEL;
D O I
10.1016/j.knosys.2024.112783
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In recent years, the Robust Markov Decision Process (RMDP) has become an important modeling framework to address the discrepancies between simulated and real-world environments in Reinforcement Learning (RL) training. The purpose of RMDP is to accommodate the uncertainty of the real-world environments, employing a conservative approach to enhance the robustness of policy decisions. However, due to the difficulty of robust value function estimation, the RMDP framework is challenging to generalize to environments with large continuous state-action spaces. Our work focuses on model-free robust RL and proposes a model-free algorithm for continuous space setting. We adopt anew perspective on uncertainty sets such that the uncertainty sets are parameterized and the parameters obey specific stochastic distributions. We present a novel approach RPC to estimate the robust value function utilizing generalized Polynomial Chaos(gPC). We provide a proof to guarantee the convergence of the algorithm. Our training framework is based on off-policy RL, which reduces the computation overhead by gPC and improves learning stability. Our algorithm can handle continuous tasks and guarantee the robustness of the algorithm without incurring excessive computational overhead. We combine RPC with the TD3 method and conduct several experiments to evaluate its performance in a continuous robot control task, and the experimental results provide further evidence of the robustness of our algorithm.
引用
收藏
页数:11
相关论文
共 50 条
  • [21] An adaptive clustering method for model-free reinforcement learning
    Matt, A
    Regensburger, G
    INMIC 2004: 8TH INTERNATIONAL MULTITOPIC CONFERENCE, PROCEEDINGS, 2004, : 362 - 367
  • [22] Model-Free Reinforcement Learning for Mean Field Games
    Mishra, Rajesh
    Vasal, Deepanshu
    Vishwanath, Sriram
    IEEE TRANSACTIONS ON CONTROL OF NETWORK SYSTEMS, 2023, 10 (04): : 2141 - 2151
  • [23] Counterfactual Credit Assignment in Model-Free Reinforcement Learning
    Mesnard, Thomas
    Weber, Theophane
    Viola, Fabio
    Thakoor, Shantanu
    Saade, Alaa
    Harutyunyan, Anna
    Dabney, Will
    Stepleton, Tom
    Heess, Nicolas
    Guez, Arthur
    Moulines, Eric
    Hutter, Marcus
    Buesing, Lars
    Munos, Remi
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 139, 2021, 139
  • [24] Driving in Dense Traffic with Model-Free Reinforcement Learning
    Saxena, Dhruv Mauria
    Bae, Sangjae
    Nakhaei, Alireza
    Fujimura, Kikuo
    Likhachev, Maxim
    2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), 2020, : 5385 - 5392
  • [25] Model-Free Reinforcement Learning with Continuous Action in Practice
    Degris, Thomas
    Pilarski, Patrick M.
    Sutton, Richard S.
    2012 AMERICAN CONTROL CONFERENCE (ACC), 2012, : 2177 - 2182
  • [26] Covariance matrix adaptation for model-free reinforcement learning
    Adaptation de la matrice de covariance pour l'apprentissage par renforcement direct
    2013, Lavoisier, 14 rue de Provigny, Cachan Cedex, F-94236, France (27)
  • [27] Robotic Table Tennis with Model-Free Reinforcement Learning
    Gao, Wenbo
    Graesser, Laura
    Choromanski, Krzysztof
    Song, Xingyou
    Lazic, Nevena
    Sanketi, Pannag
    Sindhwani, Vikas
    Jaitly, Navdeep
    2020 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), 2020, : 5556 - 5563
  • [28] MODEL-FREE ONLINE REINFORCEMENT LEARNING OF A ROBOTIC MANIPULATOR
    Sweafford, Jerry, Jr.
    Fahimi, Farbod
    MECHATRONIC SYSTEMS AND CONTROL, 2019, 47 (03): : 136 - 143
  • [29] On Robust Model-Free Reduced-Dimensional Reinforcement Learning Control for Singularly Perturbed Systems
    Mukherjee, Sayak
    Bai, He
    Chakrabortty, Aranya
    2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 3914 - 3919
  • [30] Adaptive Weight Tuning of EWMA Controller via Model-Free Deep Reinforcement Learning
    Ma, Zhu
    Pan, Tianhong
    IEEE TRANSACTIONS ON SEMICONDUCTOR MANUFACTURING, 2023, 36 (01) : 91 - 99