Solving time-delay issues in reinforcement learning via transformers

被引：0

作者：

Xia, Bo ^{[1
]}

Yang, Zaihui ^{[1
]}

Xie, Minzhi ^{[1
]}

Chang, Yongzhe ^{[1
]}

Yuan, Bo ^{[2
]}

Li, Zhiheng ^{[1
]}

Wang, Xueqian ^{[1
]}

Liang, Bin ^{[3
]}

机构：

[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China

[2] Tsinghua Univ Shenzhen, Res Inst, Shenzhen 518057, Peoples R China

[3] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China

来源：

APPLIED INTELLIGENCE | 2024年 / 54卷 / 23期

关键词：

Deep reinforcement learning; Time delay; Deterministic delayed Markov Decision Process; Offline reinforcement learning; Decision transformer; SYSTEMS;

D O I：

10.1007/s10489-024-05830-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The presence of observation and action delays in remote control scenarios significantly challenges the decision-making of agents that depend on immediate interactions, particularly within traditional deep reinforcement learning (DRL) algorithms. Existing approaches attempt to tackle this problem through various strategies, such as predicting delayed states, transforming delayed Markov Decision Processes (MDPs) into delay-free equivalents. However, both model-free and model-based methods require extensive online data, making them time-consuming and resource-intensive. To effectively handle time-delay challenges and develop a competent and robust RL algorithm, the Augmented Decision Transformer (ADT) is proposed as the first offline RL algorithm designed to enable agents to manage diverse tasks with various constant delays. It transforms a deterministic delayed MDP (DDMDP) into a standard MDP by simulating trajectories in delayed environments using offline dataset from undelayed environments. The Decision Transformer, an autoregressive model, is then employed to train a decision model based on expected rewards, past state sequences and past action sequences. Extensive experiments conducted on MuJoCo and Adroit tasks validate the robustness and efficiency of the ADT, with its average performance across all tasks being 56% better than the worst-performing comparative algorithms. The results demonstrate that the ADT can outperform state-of-the-art RL counterparts, achieving superior performance across various tasks with different delay conditions.

引用

页码：12156 / 12176

页数：21

共 50 条

[1] Electric Water Heaters Management via Reinforcement Learning With Time-Delay in Isolated Microgrids
Xu, Jiangjiao
Mahmood, Hisham
Xiao, Hao
Anderlini, Enrico
Abusara, Mohammad
IEEE ACCESS, 2021, 9 (09): : 132569 - 132579
[2] Fixed-Time ESO-Based Reinforcement Learning for Manipulator Time-Delay System
Yuan, Huizhen
Zhao, Meng
Cao, Liang
OPTIMAL CONTROL APPLICATIONS & METHODS, 2025,
[3] Extremum seeking via a time-delay approach to averaging
Zhu, Yang
Fridman, Emilia
AUTOMATICA, 2022, 135
[4] New results on delay-dependent stability for time-delay chaotic systems via time-delay feedback control
Li, Li
Yu, Fajun
CCDC 2009: 21ST CHINESE CONTROL AND DECISION CONFERENCE, VOLS 1-6, PROCEEDINGS, 2009, : 5675 - 5679
[5] Issues in separable identification of continuous-time models with time-delay
Chen, Fengwei
Zhuan, Xiangtao
Gamier, Hugues
Gilson, Marion
AUTOMATICA, 2018, 94 : 258 - 273
[6] Stabilization of time-delay neural networks via delayed pinning impulses
Liu, Xinzhi
Zhang, Kexue
Xie, Wei-Chau
CHAOS SOLITONS & FRACTALS, 2016, 93 : 223 - 234
[7] Intelligent Control for Switched Systems with Time Delay via Deep Reinforcement Learning
Song, Ruijia
Wang, Bolan
Cheng, Haoyu
Huang, Hanqiao
Yan, Jie
PROCEEDINGS OF THE 33RD CHINESE CONTROL AND DECISION CONFERENCE (CCDC 2021), 2021, : 6160 - 6165
[8] H∞ WEIGHT LEARNING ALGORITHM OF RECURRENT NEURAL NETWORKS WITH TIME-DELAY
Ahn, Choon Ki
MODERN PHYSICS LETTERS B, 2010, 24 (12): : 1217 - 1227
[9] Delay estimation via sliding mode for nonlinear time-delay systems
Zheng, Gang
Polyakov, Andrey
Levant, Arie
AUTOMATICA, 2018, 89 : 266 - 273
[10] Stability of time-delay systems via Lyapunov functions
Alastruey, CF
De la Sen, M
MATHEMATICAL PROBLEMS IN ENGINEERING, 2002, 8 (03) : 197 - 205

← 1 2 3 4 5 →