Solving time-delay issues in reinforcement learning via transformers

被引:0
|
作者
Xia, Bo [1 ]
Yang, Zaihui [1 ]
Xie, Minzhi [1 ]
Chang, Yongzhe [1 ]
Yuan, Bo [2 ]
Li, Zhiheng [1 ]
Wang, Xueqian [1 ]
Liang, Bin [3 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China
[2] Tsinghua Univ Shenzhen, Res Inst, Shenzhen 518057, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
关键词
Deep reinforcement learning; Time delay; Deterministic delayed Markov Decision Process; Offline reinforcement learning; Decision transformer; SYSTEMS;
D O I
10.1007/s10489-024-05830-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The presence of observation and action delays in remote control scenarios significantly challenges the decision-making of agents that depend on immediate interactions, particularly within traditional deep reinforcement learning (DRL) algorithms. Existing approaches attempt to tackle this problem through various strategies, such as predicting delayed states, transforming delayed Markov Decision Processes (MDPs) into delay-free equivalents. However, both model-free and model-based methods require extensive online data, making them time-consuming and resource-intensive. To effectively handle time-delay challenges and develop a competent and robust RL algorithm, the Augmented Decision Transformer (ADT) is proposed as the first offline RL algorithm designed to enable agents to manage diverse tasks with various constant delays. It transforms a deterministic delayed MDP (DDMDP) into a standard MDP by simulating trajectories in delayed environments using offline dataset from undelayed environments. The Decision Transformer, an autoregressive model, is then employed to train a decision model based on expected rewards, past state sequences and past action sequences. Extensive experiments conducted on MuJoCo and Adroit tasks validate the robustness and efficiency of the ADT, with its average performance across all tasks being 56% better than the worst-performing comparative algorithms. The results demonstrate that the ADT can outperform state-of-the-art RL counterparts, achieving superior performance across various tasks with different delay conditions.
引用
收藏
页码:12156 / 12176
页数:21
相关论文
共 50 条
  • [21] Cosmological solutions with time-delay
    Paliathanasis, Andronikos
    MODERN PHYSICS LETTERS A, 2022, 37 (25)
  • [22] Adaptive time-delay controller
    Rad, AB
    Lo, WL
    Tsang, KM
    IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2000, 47 (06) : 1350 - 1353
  • [23] A suboptimal control of linear time-delay problems via dynamic programming
    Orimi, Atefeh Gooran
    Effati, Sohrab
    Farahi, Mohammad Hadi
    IMA JOURNAL OF MATHEMATICAL CONTROL AND INFORMATION, 2022, 39 (02) : 675 - 707
  • [24] Finite-Time Stabilization for Stochastic Inertial Neural Networks with Time-Delay via Nonlinear Delay Controller
    Li, Deyi
    Wang, Yuanyuan
    Chen, Guici
    Zhu, Shasha
    MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018
  • [25] TIME-DELAY IN RANDOM SCATTERING
    FARIS, WG
    TSAY, WJ
    SIAM JOURNAL ON APPLIED MATHEMATICS, 1994, 54 (02) : 443 - 455
  • [26] Approximation Design of Composite Control for Singularly Perturbed Time-delay Systems via Delay Compensation
    Zhang Baolin
    Gao Dexin
    Lu Qiang
    Cao Feilong
    PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, : 1676 - 1680
  • [27] Solving job shop scheduling problems via deep reinforcement learning
    Yuan, Erdong
    Cheng, Shuli
    Wang, Liejun
    Song, Shiji
    Wu, Fang
    APPLIED SOFT COMPUTING, 2023, 143
  • [28] Solving Maximal Stable Set Problem via Deep Reinforcement Learning
    Wang, Taiyi
    Shi, Jiahao
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 483 - 489
  • [29] Synchronization of directed complex networks with uncertainty and time-delay
    Wu, Yunlong
    Zhao, Qian
    Li, Hui
    INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2018, 14 (05):
  • [30] Design of the linear controller of a class of time-delay chaos
    You, Ting
    Hu, Yueli
    CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (02): : S2639 - S2644