Solving time-delay issues in reinforcement learning via transformers

被引：0

作者：

Xia, Bo ^{[1
]}

Yang, Zaihui ^{[1
]}

Xie, Minzhi ^{[1
]}

Chang, Yongzhe ^{[1
]}

Yuan, Bo ^{[2
]}

Li, Zhiheng ^{[1
]}

Wang, Xueqian ^{[1
]}

Liang, Bin ^{[3
]}

机构：

[1] Tsinghua Univ, Shenzhen Int Grad Sch, Shenzhen 518055, Peoples R China

[2] Tsinghua Univ Shenzhen, Res Inst, Shenzhen 518057, Peoples R China

[3] Tsinghua Univ, Dept Automat, Beijing 100084, Peoples R China

来源：

APPLIED INTELLIGENCE | 2024年 / 54卷 / 23期

关键词：

Deep reinforcement learning; Time delay; Deterministic delayed Markov Decision Process; Offline reinforcement learning; Decision transformer; SYSTEMS;

D O I：

10.1007/s10489-024-05830-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The presence of observation and action delays in remote control scenarios significantly challenges the decision-making of agents that depend on immediate interactions, particularly within traditional deep reinforcement learning (DRL) algorithms. Existing approaches attempt to tackle this problem through various strategies, such as predicting delayed states, transforming delayed Markov Decision Processes (MDPs) into delay-free equivalents. However, both model-free and model-based methods require extensive online data, making them time-consuming and resource-intensive. To effectively handle time-delay challenges and develop a competent and robust RL algorithm, the Augmented Decision Transformer (ADT) is proposed as the first offline RL algorithm designed to enable agents to manage diverse tasks with various constant delays. It transforms a deterministic delayed MDP (DDMDP) into a standard MDP by simulating trajectories in delayed environments using offline dataset from undelayed environments. The Decision Transformer, an autoregressive model, is then employed to train a decision model based on expected rewards, past state sequences and past action sequences. Extensive experiments conducted on MuJoCo and Adroit tasks validate the robustness and efficiency of the ADT, with its average performance across all tasks being 56% better than the worst-performing comparative algorithms. The results demonstrate that the ADT can outperform state-of-the-art RL counterparts, achieving superior performance across various tasks with different delay conditions.

引用

页码：12156 / 12176

页数：21

共 50 条

[31] Sampled-Data Extremum Seeking With Constant Delay: A Time-Delay Approach
Zhu, Yang
Fridman, Emilia
Oliveira, Tiago Roux
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2023, 68 (01) : 432 - 439
[32] Adaptive pseudospectral methods for solving constrained linear and nonlinear time-delay optimal control problems
Malekin, Mohammad
Hashim, Ishak
JOURNAL OF THE FRANKLIN INSTITUTE-ENGINEERING AND APPLIED MATHEMATICS, 2014, 351 (02): : 811 - 839
[33] Time-Delay Compensation by Communication Disturbance Observer for Bilateral Teleoperation Under Time-Varying Delay
Natori, Kenji
Tsuji, Toshiaki
Ohnishi, Kouhei
Hace, Ales
Jezernik, Karel
IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS, 2010, 57 (03) : 1050 - 1062
[34] Networked Control System Time-Delay Compensation Based on Time-Delay Prediction and Improved Implicit GPC
Tian, Zhong-Da
Li, Shu-Jiang
Wang, Yan-Hong
Yu, Hong-Xia
ALGORITHMS, 2015, 8 (01): : 3 - 18
[35] Research on Deep Reinforcement Learning Control Algorithm for Active Suspension Considering Uncertain Time Delay
Wang, Yang
Wang, Cheng
Zhao, Shijie
Guo, Konghui
SENSORS, 2023, 23 (18)
[36] ADSORPTION-KINETICS WITH TIME-DELAY
OHSHIMA, H
FUJITA, N
KONDO, T
COLLOID AND POLYMER SCIENCE, 1992, 270 (07) : 707 - 710
[37] TIME-DELAY ART FOR SPATIOTEMPORAL PATTERNS
HAGIWARA, M
NEUROCOMPUTING, 1994, 6 (5-6) : 513 - 521
[38] Switching control and time-delay identification
Chen, Qi
Li, Xiang
Qin, Zhi-Chang
Zhong, Shun
Sun, J. Q.
COMMUNICATIONS IN NONLINEAR SCIENCE AND NUMERICAL SIMULATION, 2014, 19 (12) : 4161 - 4169
[39] Adaptive Control for Haptics with Time-Delay
Richert, D.
Macnab, C. J. B.
Pieper, J. K.
49TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), 2010, : 3650 - 3655
[40] Time-delay coordinates and polynomial mappings
Pollicott, M
ADVANCES IN MATHEMATICS, 2003, 177 (02) : 280 - 296

← 1 2 3 4 5 →