Solving time-delay issues in reinforcement learning via transformers

被引:0
|
作者
Xia, Bo [1 ]
Yang, Zaihui [1 ]
Xie, Minzhi [1 ]
Chang, Yongzhe [1 ]
Yuan, Bo [2 ]
Li, Zhiheng [1 ]
Wang, Xueqian [1 ]
Liang, Bin [3 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China
[2] Tsinghua Univ Shenzhen, Res Inst, Shenzhen 518057, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
关键词
Deep reinforcement learning; Time delay; Deterministic delayed Markov Decision Process; Offline reinforcement learning; Decision transformer; SYSTEMS;
D O I
10.1007/s10489-024-05830-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The presence of observation and action delays in remote control scenarios significantly challenges the decision-making of agents that depend on immediate interactions, particularly within traditional deep reinforcement learning (DRL) algorithms. Existing approaches attempt to tackle this problem through various strategies, such as predicting delayed states, transforming delayed Markov Decision Processes (MDPs) into delay-free equivalents. However, both model-free and model-based methods require extensive online data, making them time-consuming and resource-intensive. To effectively handle time-delay challenges and develop a competent and robust RL algorithm, the Augmented Decision Transformer (ADT) is proposed as the first offline RL algorithm designed to enable agents to manage diverse tasks with various constant delays. It transforms a deterministic delayed MDP (DDMDP) into a standard MDP by simulating trajectories in delayed environments using offline dataset from undelayed environments. The Decision Transformer, an autoregressive model, is then employed to train a decision model based on expected rewards, past state sequences and past action sequences. Extensive experiments conducted on MuJoCo and Adroit tasks validate the robustness and efficiency of the ADT, with its average performance across all tasks being 56% better than the worst-performing comparative algorithms. The results demonstrate that the ADT can outperform state-of-the-art RL counterparts, achieving superior performance across various tasks with different delay conditions.
引用
收藏
页码:12156 / 12176
页数:21
相关论文
共 50 条
  • [31] Sampled-Data Extremum Seeking With Constant Delay: A Time-Delay Approach
    Zhu, Yang
    Fridman, Emilia
    Oliveira, Tiago Roux
    IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (01) : 432 - 439
  • [32] Adaptive pseudospectral methods for solving constrained linear and nonlinear time-delay optimal control problems
    Malekin, Mohammad
    Hashim, Ishak
    JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2014, 351 (02): : 811 - 839
  • [33] Time-Delay Compensation by Communication Disturbance Observer for Bilateral Teleoperation Under Time-Varying Delay
    Natori, Kenji
    Tsuji, Toshiaki
    Ohnishi, Kouhei
    Hace, Ales
    Jezernik, Karel
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2010, 57 (03) : 1050 - 1062
  • [34] Networked Control System Time-Delay Compensation Based on Time-Delay Prediction and Improved Implicit GPC
    Tian, Zhong-Da
    Li, Shu-Jiang
    Wang, Yan-Hong
    Yu, Hong-Xia
    ALGORITHMS, 2015, 8 (01): : 3 - 18
  • [35] Research on Deep Reinforcement Learning Control Algorithm for Active Suspension Considering Uncertain Time Delay
    Wang, Yang
    Wang, Cheng
    Zhao, Shijie
    Guo, Konghui
    SENSORS, 2023, 23 (18)
  • [36] ADSORPTION-KINETICS WITH TIME-DELAY
    OHSHIMA, H
    FUJITA, N
    KONDO, T
    COLLOID AND POLYMER SCIENCE, 1992, 270 (07) : 707 - 710
  • [37] TIME-DELAY ART FOR SPATIOTEMPORAL PATTERNS
    HAGIWARA, M
    NEUROCOMPUTING, 1994, 6 (5-6) : 513 - 521
  • [38] Switching control and time-delay identification
    Chen, Qi
    Li, Xiang
    Qin, Zhi-Chang
    Zhong, Shun
    Sun, J. Q.
    COMMUNICATIONS IN NONLINEAR SCIENCE AND NUMERICAL SIMULATION, 2014, 19 (12) : 4161 - 4169
  • [39] Adaptive Control for Haptics with Time-Delay
    Richert, D.
    Macnab, C. J. B.
    Pieper, J. K.
    49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 3650 - 3655
  • [40] Time-delay coordinates and polynomial mappings
    Pollicott, M
    ADVANCES IN MATHEMATICS, 2003, 177 (02) : 280 - 296