Addressing Delays in Reinforcement Learning via Delayed Adversarial Imitation Learning

被引:2
作者
Xie, Minzhi [1 ]
Xia, Bo [1 ]
Yu, Yalou [1 ]
Wang, Xueqian [1 ]
Chang, Yongzhe [1 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518000, Peoples R China
来源
ARTIFICIAL NEURAL NETWORKS AND MACHINE LEARNING, ICANN 2023, PT III | 2023年 / 14256卷
关键词
Reinforcement Learning; Delays; Adversarial Imitation Learning;
D O I
10.1007/978-3-031-44213-1_23
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Observation and action delays occur commonly in many real-world tasks which violate Markov property and consequently degrade the performance of Reinforcement Learning methods. So far, there have been several efforts on delays in RL. Model-based methods train forward models to predict unknown current information while model-free approaches focus on state-augmentation to define new Markov Decision Processes. However, previous works suffer from difficult model fine-tuning and the curse of dimensionality that prevent them from solving delays. Motivated by the advantage of imitation learning, a novel idea is introduced that a delayed policy can be trained by imitating undelayed expert demonstrations. Based on the idea, we propose an algorithm named Delayed Adversarial Imitation Learning (DAIL). In DAIL, a few undelayed expert demonstrations are utilized to generate a surrogate delayed expert and a delayed policy is trained by imitating the surrogate expert using adversarial imitation learning. Moreover, a theoretical analysis of DAIL is presented to validate the rationality of DAIL and guide the practical design of the approach. Finally, experiments on continuous control tasks demonstrate that DAIL achieves much higher performance than previous approaches in solving delays in RL, where DAIL can converge to high performance with an excellent sample efficiency, even for substantial delays, while previous works cannot due to the divergence problems.
引用
收藏
页码:271 / 282
页数:12
相关论文
共 22 条
  • [1] Abbeel P., 2004, P INT C MACHINE LEAR, DOI [10.1145/1015330.1015430, DOI 10.1145/1015330.1015430]
  • [2] Altman E., 1992, Performance Evaluation Review, V20, P193, DOI 10.1145/149439.133106
  • [3] Bain M., 2000, Machine Intelligence, V15, P103
  • [4] Ball T, 2014, P 19 INT C INT US IN, P83, DOI DOI 10.1145/2557500.2557533
  • [5] Delay-aware model-based reinforcement learning for continuous control
    Chen, Baiming
    Xu, Mengdi
    Li, Liang
    Zhao, Ding
    [J]. NEUROCOMPUTING, 2021, 450 : 119 - 128
  • [6] Devroye L, 2023, Arxiv, DOI arXiv:1810.08693
  • [7] Finn C, 2016, Arxiv, DOI arXiv:1611.03852
  • [8] Firoiu V, 2018, Arxiv, DOI arXiv:1810.07286
  • [9] Fu JS, 2018, Arxiv, DOI arXiv:1710.11248
  • [10] Generative Adversarial Networks
    Goodfellow, Ian
    Pouget-Abadie, Jean
    Mirza, Mehdi
    Xu, Bing
    Warde-Farley, David
    Ozair, Sherjil
    Courville, Aaron
    Bengio, Yoshua
    [J]. COMMUNICATIONS OF THE ACM, 2020, 63 (11) : 139 - 144