Solving time-delay issues in reinforcement learning via transformers

被引：0

作者：

Xia, Bo ^{[1
]}

Yang, Zaihui ^{[1
]}

Xie, Minzhi ^{[1
]}

Chang, Yongzhe ^{[1
]}

Yuan, Bo ^{[2
]}

Li, Zhiheng ^{[1
]}

Wang, Xueqian ^{[1
]}

Liang, Bin ^{[3
]}

机构：

[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China

[2] Tsinghua Univ Shenzhen, Res Inst, Shenzhen 518057, Peoples R China

[3] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China

来源：

APPLIED INTELLIGENCE | 2024年 / 54卷 / 23期

关键词：

Deep reinforcement learning; Time delay; Deterministic delayed Markov Decision Process; Offline reinforcement learning; Decision transformer; SYSTEMS;

D O I：

10.1007/s10489-024-05830-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The presence of observation and action delays in remote control scenarios significantly challenges the decision-making of agents that depend on immediate interactions, particularly within traditional deep reinforcement learning (DRL) algorithms. Existing approaches attempt to tackle this problem through various strategies, such as predicting delayed states, transforming delayed Markov Decision Processes (MDPs) into delay-free equivalents. However, both model-free and model-based methods require extensive online data, making them time-consuming and resource-intensive. To effectively handle time-delay challenges and develop a competent and robust RL algorithm, the Augmented Decision Transformer (ADT) is proposed as the first offline RL algorithm designed to enable agents to manage diverse tasks with various constant delays. It transforms a deterministic delayed MDP (DDMDP) into a standard MDP by simulating trajectories in delayed environments using offline dataset from undelayed environments. The Decision Transformer, an autoregressive model, is then employed to train a decision model based on expected rewards, past state sequences and past action sequences. Extensive experiments conducted on MuJoCo and Adroit tasks validate the robustness and efficiency of the ADT, with its average performance across all tasks being 56% better than the worst-performing comparative algorithms. The results demonstrate that the ADT can outperform state-of-the-art RL counterparts, achieving superior performance across various tasks with different delay conditions.

引用

页码：12156 / 12176

页数：21

共 50 条

[21] Cosmological solutions with time-delay
Paliathanasis, Andronikos
MODERN PHYSICS LETTERS A, 2022, 37 (25)
[22] Adaptive time-delay controller
Rad, AB
Lo, WL
Tsang, KM
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2000, 47 (06) : 1350 - 1353
[23] A suboptimal control of linear time-delay problems via dynamic programming
Orimi, Atefeh Gooran
Effati, Sohrab
Farahi, Mohammad Hadi
IMA JOURNAL OF MATHEMATICAL CONTROL AND INFORMATION, 2022, 39 (02) : 675 - 707
[24] Finite-Time Stabilization for Stochastic Inertial Neural Networks with Time-Delay via Nonlinear Delay Controller
Li, Deyi
Wang, Yuanyuan
Chen, Guici
Zhu, Shasha
MATHEMATICAL PROBLEMS IN ENGINEERING, 2018, 2018
[25] TIME-DELAY IN RANDOM SCATTERING
FARIS, WG
TSAY, WJ
SIAM JOURNAL ON APPLIED MATHEMATICS, 1994, 54 (02) : 443 - 455
[26] Approximation Design of Composite Control for Singularly Perturbed Time-delay Systems via Delay Compensation
Zhang Baolin
Gao Dexin
Lu Qiang
Cao Feilong
PROCEEDINGS OF THE 29TH CHINESE CONTROL CONFERENCE, 2010, : 1676 - 1680
[27] Solving job shop scheduling problems via deep reinforcement learning
Yuan, Erdong
Cheng, Shuli
Wang, Liejun
Song, Shiji
Wu, Fang
APPLIED SOFT COMPUTING, 2023, 143
[28] Solving Maximal Stable Set Problem via Deep Reinforcement Learning
Wang, Taiyi
Shi, Jiahao
ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 483 - 489
[29] Synchronization of directed complex networks with uncertainty and time-delay
Wu, Yunlong
Zhao, Qian
Li, Hui
INTERNATIONAL JOURNAL OF DISTRIBUTED SENSOR NETWORKS, 2018, 14 (05):
[30] Design of the linear controller of a class of time-delay chaos
You, Ting
Hu, Yueli
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2019, 22 (02): : S2639 - S2644

← 1 2 3 4 5 →