Solving time-delay issues in reinforcement learning via transformers

被引：0

作者：

Xia, Bo ^{[1
]}

Yang, Zaihui ^{[1
]}

Xie, Minzhi ^{[1
]}

Chang, Yongzhe ^{[1
]}

Yuan, Bo ^{[2
]}

Li, Zhiheng ^{[1
]}

Wang, Xueqian ^{[1
]}

Liang, Bin ^{[3
]}

机构：

[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China

[2] Tsinghua Univ Shenzhen, Res Inst, Shenzhen 518057, Peoples R China

[3] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China

来源：

APPLIED INTELLIGENCE | 2024年 / 54卷 / 23期

关键词：

Deep reinforcement learning; Time delay; Deterministic delayed Markov Decision Process; Offline reinforcement learning; Decision transformer; SYSTEMS;

D O I：

10.1007/s10489-024-05830-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The presence of observation and action delays in remote control scenarios significantly challenges the decision-making of agents that depend on immediate interactions, particularly within traditional deep reinforcement learning (DRL) algorithms. Existing approaches attempt to tackle this problem through various strategies, such as predicting delayed states, transforming delayed Markov Decision Processes (MDPs) into delay-free equivalents. However, both model-free and model-based methods require extensive online data, making them time-consuming and resource-intensive. To effectively handle time-delay challenges and develop a competent and robust RL algorithm, the Augmented Decision Transformer (ADT) is proposed as the first offline RL algorithm designed to enable agents to manage diverse tasks with various constant delays. It transforms a deterministic delayed MDP (DDMDP) into a standard MDP by simulating trajectories in delayed environments using offline dataset from undelayed environments. The Decision Transformer, an autoregressive model, is then employed to train a decision model based on expected rewards, past state sequences and past action sequences. Extensive experiments conducted on MuJoCo and Adroit tasks validate the robustness and efficiency of the ADT, with its average performance across all tasks being 56% better than the worst-performing comparative algorithms. The results demonstrate that the ADT can outperform state-of-the-art RL counterparts, achieving superior performance across various tasks with different delay conditions.

引用

页码：12156 / 12176

页数：21

共 50 条

[41] Solving flexible job shop scheduling problems via deep reinforcement learning
Yuan, Erdong
Wang, Liejun
Cheng, Shuli
Song, Shiji
Fan, Wei
Li, Yongming
EXPERT SYSTEMS WITH APPLICATIONS, 2024, 245
[42] Solving Rubik's cube via quantum mechanics and deep reinforcement learning
Corli, Sebastiano
Moro, Lorenzo
Galli, Davide E.
Prati, Enrico
JOURNAL OF PHYSICS A-MATHEMATICAL AND THEORETICAL, 2021, 54 (42)
[43] Heterogeneous Attentions for Solving Pickup and Delivery Problem via Deep Reinforcement Learning
Li, Jingwen
Xin, Liang
Cao, Zhiguang
Lim, Andrew
Song, Wen
Zhang, Jie
IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2022, 23 (03) : 2306 - 2315
[44] A time-delay equation: well-posedness to optimal control
Yildirim, Kenan
Alkan, Sertan
OPEN PHYSICS, 2016, 14 (01): : 212 - 220
[45] Generalized time-delay reverse synchronization with error feedback coefficients
Shi, Qiqin
Zhao, Yuzhuo
Zeng, Jian
Ding, Qun
PHYSICA SCRIPTA, 2024, 99 (02)
[46] DISCRETE METHOD FOR ESTIMATION OF TIME-DELAY OUTSIDE OF SAMPLING PERIOD
Talas, Stanislav
Bobal, Vladimir
Krhovjak, Adam
Rusar, Lukas
PROCEEDINGS - 30TH EUROPEAN CONFERENCE ON MODELLING AND SIMULATION ECMS 2016, 2016, : 287 - 292
[47] A Time-Delay Neural Network Model for Unconstrained Nonconvex Optimization
Liao, Li-Zhi
Dai, Yu-Hong
NUMERICAL ANALYSIS AND OPTIMIZATION, 2018, 235 : 155 - 171
[48] Adaptive synchronization of an uncertain coupling complex network with time-delay
Zhang, Huaguang
Zhao, Mo
Wang, Zhiliang
Wu, Zhenning
NONLINEAR DYNAMICS, 2014, 77 (03) : 643 - 653
[49] On Memristor-Based Impulsive Neural Networks with Time-Delay
Hu, Bin
Guan, Zhi-Hong
Liu, Zhi-Wei
Jiang, Xiao-Wei
2017 29TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2017, : 4748 - 4753
[50] Compensation of Mismatched Disturbances for Nonlinear Plants with Distributed Time-delay
Furtat, Igor
Gushchin, Pavel
Konovalov, Dmitrii
Vrazhevsky, Sergey
ICINCO: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON INFORMATICS IN CONTROL, AUTOMATION AND ROBOTICS, VOL 1, 2019, : 269 - 275

← 1 2 3 4 5 →