Robust Reinforcement Learning via Progressive Task Sequence

被引:0
作者
Li, Yike [1 ]
Tian, Yunzhe [1 ]
Tong, Endong [1 ]
Niu, Wenjia [1 ]
Liu, Jiqiang [1 ]
机构
[1] Beijing Jiaotong Univ, Beijing Key Lab Secur & Privacy Intelligent Trans, Beijing, Peoples R China
来源
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023 | 2023年
基金
中国国家自然科学基金;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Robust reinforcement learning (RL) has been a challenging problem due to the gap between simulation and the real world. Existing efforts typically address the robust RL problem by solving a maxmin problem. The main idea is to maximize the cumulative reward under the worst-possible perturbations. However, the worst-case optimization either leads to overly conservative solutions or unstable training process, which further affects the policy robustness and generalization performance. In this paper, we tackle this problem from both formulation definition and algorithm design. First, we formulate the robust RL as a max-expectation optimization problem, where the goal is to find an optimal policy under both the worst cases and the non-worst cases. Then, we propose a novel framework DRRL to solve the max-expectation optimization. Given our definition of the feasible tasks, a task generation and sequencing mechanism is introduced to dynamically output tasks at appropriate difficulty level for the current policy. With these progressive tasks, DRRL realizes dynamic multi-task learning to improve the policy robustness and the training stability. Finally, extensive experiments demonstrate that the proposed method exhibits significant performance on the unmanned CarRacing game and multiple high-dimensional MuJoCo environments.
引用
收藏
页码:455 / 463
页数:9
相关论文
共 38 条
  • [1] Abdullah MA, 2019, Arxiv, DOI arXiv:1907.13196
  • [2] Learning dexterous in-hand manipulation
    Andrychowicz, Marcin
    Baker, Bowen
    Chociej, Maciek
    Jozefowicz, Rafal
    McGrew, Bob
    Pachocki, Jakub
    Petron, Arthur
    Plappert, Matthias
    Powell, Glenn
    Ray, Alex
    Schneider, Jonas
    Sidor, Szymon
    Tobin, Josh
    Welinder, Peter
    Weng, Lilian
    Zaremba, Wojciech
    [J]. INTERNATIONAL JOURNAL OF ROBOTICS RESEARCH, 2020, 39 (01) : 3 - 20
  • [3] Brockman G, 2016, Arxiv, DOI [arXiv:1606.01540, DOI 10.48550/ARXIV.1606.01540]
  • [4] Chen Yuzhou, 2021, P MACHINE LEARNING R, V139
  • [5] Curi Sebastian, 2021, INT C MACHINE LEARNI, P2254
  • [6] Dalal Murtaza, 2021, Advances in Neural Information Processing Systems, V34
  • [7] An Analysis of Activation Function Saturation in Particle Swarm Optimization Trained Neural Networks
    Dennis, Cody
    Engelbrecht, Andries P.
    Ombuki-Berman, Beatrice M.
    [J]. NEURAL PROCESSING LETTERS, 2020, 52 (02) : 1123 - 1153
  • [8] On the doubt about margin explanation of boosting
    Gao, Wei
    Zhou, Zhi-Hua
    [J]. ARTIFICIAL INTELLIGENCE, 2013, 203 : 1 - 18
  • [9] He SC, 2022, AAAI CONF ARTIF INTE, P6884
  • [10] Robust dynamic programming
    Iyengar, GN
    [J]. MATHEMATICS OF OPERATIONS RESEARCH, 2005, 30 (02) : 257 - 280