Solving time-delay issues in reinforcement learning via transformers

被引:0
|
作者
Xia, Bo [1 ]
Yang, Zaihui [1 ]
Xie, Minzhi [1 ]
Chang, Yongzhe [1 ]
Yuan, Bo [2 ]
Li, Zhiheng [1 ]
Wang, Xueqian [1 ]
Liang, Bin [3 ]
机构
[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China
[2] Tsinghua Univ Shenzhen, Res Inst, Shenzhen 518057, Peoples R China
[3] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China
关键词
Deep reinforcement learning; Time delay; Deterministic delayed Markov Decision Process; Offline reinforcement learning; Decision transformer; SYSTEMS;
D O I
10.1007/s10489-024-05830-2
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The presence of observation and action delays in remote control scenarios significantly challenges the decision-making of agents that depend on immediate interactions, particularly within traditional deep reinforcement learning (DRL) algorithms. Existing approaches attempt to tackle this problem through various strategies, such as predicting delayed states, transforming delayed Markov Decision Processes (MDPs) into delay-free equivalents. However, both model-free and model-based methods require extensive online data, making them time-consuming and resource-intensive. To effectively handle time-delay challenges and develop a competent and robust RL algorithm, the Augmented Decision Transformer (ADT) is proposed as the first offline RL algorithm designed to enable agents to manage diverse tasks with various constant delays. It transforms a deterministic delayed MDP (DDMDP) into a standard MDP by simulating trajectories in delayed environments using offline dataset from undelayed environments. The Decision Transformer, an autoregressive model, is then employed to train a decision model based on expected rewards, past state sequences and past action sequences. Extensive experiments conducted on MuJoCo and Adroit tasks validate the robustness and efficiency of the ADT, with its average performance across all tasks being 56% better than the worst-performing comparative algorithms. The results demonstrate that the ADT can outperform state-of-the-art RL counterparts, achieving superior performance across various tasks with different delay conditions.
引用
收藏
页码:12156 / 12176
页数:21
相关论文
共 50 条
  • [41] Solving flexible job shop scheduling problems via deep reinforcement learning
    Yuan, Erdong
    Wang, Liejun
    Cheng, Shuli
    Song, Shiji
    Fan, Wei
    Li, Yongming
    EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
  • [42] Solving Rubik's cube via quantum mechanics and deep reinforcement learning
    Corli, Sebastiano
    Moro, Lorenzo
    Galli, Davide E.
    Prati, Enrico
    JOURNAL OF PHYSICS A-MATHEMATICAL AND THEORETICAL, 2021, 54 (42)
  • [43] Heterogeneous Attentions for Solving Pickup and Delivery Problem via Deep Reinforcement Learning
    Li, Jingwen
    Xin, Liang
    Cao, Zhiguang
    Lim, Andrew
    Song, Wen
    Zhang, Jie
    IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (03) : 2306 - 2315
  • [44] A time-delay equation: well-posedness to optimal control
    Yildirim, Kenan
    Alkan, Sertan
    OPEN PHYSICS, 2016, 14 (01): : 212 - 220
  • [45] Generalized time-delay reverse synchronization with error feedback coefficients
    Shi, Qiqin
    Zhao, Yuzhuo
    Zeng, Jian
    Ding, Qun
    PHYSICA SCRIPTA, 2024, 99 (02)
  • [46] DISCRETE METHOD FOR ESTIMATION OF TIME-DELAY OUTSIDE OF SAMPLING PERIOD
    Talas, Stanislav
    Bobal, Vladimir
    Krhovjak, Adam
    Rusar, Lukas
    PROCEEDINGS - 30TH EUROPEAN CONFERENCE ON MODELLING AND SIMULATION ECMS 2016, 2016, : 287 - 292
  • [47] A Time-Delay Neural Network Model for Unconstrained Nonconvex Optimization
    Liao, Li-Zhi
    Dai, Yu-Hong
    NUMERICAL ANALYSIS AND OPTIMIZATION, 2018, 235 : 155 - 171
  • [48] Adaptive synchronization of an uncertain coupling complex network with time-delay
    Zhang, Huaguang
    Zhao, Mo
    Wang, Zhiliang
    Wu, Zhenning
    NONLINEAR DYNAMICS, 2014, 77 (03) : 643 - 653
  • [49] On Memristor-Based Impulsive Neural Networks with Time-Delay
    Hu, Bin
    Guan, Zhi-Hong
    Liu, Zhi-Wei
    Jiang, Xiao-Wei
    2017 29TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2017, : 4748 - 4753
  • [50] Compensation of Mismatched Disturbances for Nonlinear Plants with Distributed Time-delay
    Furtat, Igor
    Gushchin, Pavel
    Konovalov, Dmitrii
    Vrazhevsky, Sergey
    ICINCO: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL 1, 2019, : 269 - 275