A Hybrid Online Off-Policy Reinforcement Learning Agent Framework Supported by Transformers

被引：2

作者：

Villarrubia-Martin, Enrique Adrian ^{[1
]}

Rodriguez-Benitez, Luis ^{[1
]}

Jimenez-Linares, Luis ^{[1
]}

Munoz-Valero, David ^{[2
]}

Liu, Jun ^{[3
]}

机构：

[1] Univ Castilla La Mancha, Dept Technol & Informat Syst, Paseo Univ, Ciudad Real 413005, Spain

[2] Univ Castilla La Mancha, Dept Technol & Informat Syst, Ave Carlos III,s-n, Toledo 45004, Spain

[3] Univ Ulster, Sch Comp, Belfast, North Ireland

来源：

INTERNATIONAL JOURNAL OF NEURAL SYSTEMS | 2023年 / 33卷 / 12期

关键词：

Reinforcement learning; self-attention; off-policy; Transformer; experience replay; LEVEL;

D O I：

10.1142/S012906572350065X

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning (RL) is a powerful technique that allows agents to learn optimal decision-making policies through interactions with an environment. However, traditional RL algorithms suffer from several limitations such as the need for large amounts of data and long-term credit assignment, i.e. the problem of determining which actions actually produce a certain reward. Recently, Transformers have shown their capacity to address these constraints in this area of learning in an offline setting. This paper proposes a framework that uses Transformers to enhance the training of online off-policy RL agents and address the challenges described above through self-attention. The proposal introduces a hybrid agent with a mixed policy that combines an online off-policy agent with an offline Transformer agent using the Decision Transformer architecture. By sequentially exchanging the experience replay buffer between the agents, the agent's learning training efficiency is improved in the first iterations and so is the training of Transformer-based RL agents in situations with limited data availability or unknown environments.

引用

页数：19

共 80 条

[21]

Dosovitskiy A., 2021, arXiv

[22] A hierarchical framework for improving ride comfort of autonomous vehicles via deep reinforcement learning with external knowledge [J].

Du, Yuchuan ;

Chen, Jing ;

Zhao, Cong ;

Liao, Feixiong ;

Zhu, Meixin .

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2023, 38 (08) :1059-1078

[23] Challenges of real-world reinforcement learning: definitions, benchmarks and analysis [J].

Dulac-Arnold, Gabriel ;

Levine, Nir ;

Mankowitz, Daniel J. ;

Li, Jerry ;

Paduraru, Cosmin ;

Gowal, Sven ;

Hester, Todd .

MACHINE LEARNING, 2021, 110 (09) :2419-2468

[24] Multi-agent modeling of hazard-household-infrastructure nexus for equitable resilience assessment [J].

Esmalian, Amir ;

Wang, Wanqiu ;

Mostafavi, Ali .

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2022, 37 (12) :1491-1520

[25]

Freitag M., 2017, P 1 WORKSH NEUR MACH, P56, DOI 10.18653/V1/W17-3207

[26]

Fujimoto S, 2019, PR MACH LEARN RES, V97

[27] A deep reinforcement learning approach to mountain railway alignment optimization [J].

Gao, Tianci ;

Li, Zihan ;

Gao, Yan ;

Schonfeld, Paul ;

Feng, Xiaoyun ;

Wang, Qingyuan ;

He, Qing .

COMPUTER-AIDED CIVIL AND INFRASTRUCTURE ENGINEERING, 2022, 37 (01) :73-92

[28]

Hafner D., 2021, ICLR 2021 9 INT C LE

[29]

Hinton G, 2012, Cited on, V14, P2

[30]

Hochreiter S, 1997, NEURAL COMPUT, V9, P1735, DOI [10.1162/neco.1997.9.1.1, 10.1007/978-3-642-24797-2]

← 1 2 3 4 5 6 7 8 →