Solving time-delay issues in reinforcement learning via transformers

被引:0
|
作者
Xia, Bo [1 ]
Yang, Zaihui [1 ]
Xie, Minzhi [1 ]
Chang, Yongzhe [1 ]
Yuan, Bo [2 ]
Li, Zhiheng [1 ]
Wang, Xueqian [1 ]
Liang, Bin [3 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China
[2] Tsinghua Univ Shenzhen, Res Inst, Shenzhen 518057, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
关键词
Deep reinforcement learning; Time delay; Deterministic delayed Markov Decision Process; Offline reinforcement learning; Decision transformer; SYSTEMS;
D O I
10.1007/s10489-024-05830-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The presence of observation and action delays in remote control scenarios significantly challenges the decision-making of agents that depend on immediate interactions, particularly within traditional deep reinforcement learning (DRL) algorithms. Existing approaches attempt to tackle this problem through various strategies, such as predicting delayed states, transforming delayed Markov Decision Processes (MDPs) into delay-free equivalents. However, both model-free and model-based methods require extensive online data, making them time-consuming and resource-intensive. To effectively handle time-delay challenges and develop a competent and robust RL algorithm, the Augmented Decision Transformer (ADT) is proposed as the first offline RL algorithm designed to enable agents to manage diverse tasks with various constant delays. It transforms a deterministic delayed MDP (DDMDP) into a standard MDP by simulating trajectories in delayed environments using offline dataset from undelayed environments. The Decision Transformer, an autoregressive model, is then employed to train a decision model based on expected rewards, past state sequences and past action sequences. Extensive experiments conducted on MuJoCo and Adroit tasks validate the robustness and efficiency of the ADT, with its average performance across all tasks being 56% better than the worst-performing comparative algorithms. The results demonstrate that the ADT can outperform state-of-the-art RL counterparts, achieving superior performance across various tasks with different delay conditions.
引用
收藏
页码:12156 / 12176
页数:21
相关论文
共 50 条
  • [1] Electric Water Heaters Management via Reinforcement Learning With Time-Delay in Isolated Microgrids
    Xu, Jiangjiao
    Mahmood, Hisham
    Xiao, Hao
    Anderlini, Enrico
    Abusara, Mohammad
    IEEE ACCESS, 2021, 9 (09): : 132569 - 132579
  • [2] Fixed-Time ESO-Based Reinforcement Learning for Manipulator Time-Delay System
    Yuan, Huizhen
    Zhao, Meng
    Cao, Liang
    OPTIMAL CONTROL APPLICATIONS & METHODS, 2025,
  • [3] Extremum seeking via a time-delay approach to averaging
    Zhu, Yang
    Fridman, Emilia
    AUTOMATICA, 2022, 135
  • [4] New results on delay-dependent stability for time-delay chaotic systems via time-delay feedback control
    Li, Li
    Yu, Fajun
    CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 5675 - 5679
  • [5] Issues in separable identification of continuous-time models with time-delay
    Chen, Fengwei
    Zhuan, Xiangtao
    Gamier, Hugues
    Gilson, Marion
    AUTOMATICA, 2018, 94 : 258 - 273
  • [6] Stabilization of time-delay neural networks via delayed pinning impulses
    Liu, Xinzhi
    Zhang, Kexue
    Xie, Wei-Chau
    CHAOS SOLITONS & FRACTALS, 2016, 93 : 223 - 234
  • [7] Intelligent Control for Switched Systems with Time Delay via Deep Reinforcement Learning
    Song, Ruijia
    Wang, Bolan
    Cheng, Haoyu
    Huang, Hanqiao
    Yan, Jie
    PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 6160 - 6165
  • [8] H∞ WEIGHT LEARNING ALGORITHM OF RECURRENT NEURAL NETWORKS WITH TIME-DELAY
    Ahn, Choon Ki
    MODERN PHYSICS LETTERS B, 2010, 24 (12): : 1217 - 1227
  • [9] Delay estimation via sliding mode for nonlinear time-delay systems
    Zheng, Gang
    Polyakov, Andrey
    Levant, Arie
    AUTOMATICA, 2018, 89 : 266 - 273
  • [10] Stability of time-delay systems via Lyapunov functions
    Alastruey, CF
    De la Sen, M
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2002, 8 (03) : 197 - 205