A Hybrid Online Off-Policy Reinforcement Learning Agent Framework Supported by Transformers

被引:2
|
作者
Villarrubia-Martin, Enrique Adrian [1 ]
Rodriguez-Benitez, Luis [1 ]
Jimenez-Linares, Luis [1 ]
Munoz-Valero, David [2 ]
Liu, Jun [3 ]
机构
[1] Univ Castilla La Mancha, Dept Technol & Informat Syst, Paseo Univ, Ciudad Real 413005, Spain
[2] Univ Castilla La Mancha, Dept Technol & Informat Syst, Ave Carlos III,s-n, Toledo 45004, Spain
[3] Univ Ulster, Sch Comp, Belfast, North Ireland
关键词
Reinforcement learning; self-attention; off-policy; Transformer; experience replay; LEVEL;
D O I
10.1142/S012906572350065X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) is a powerful technique that allows agents to learn optimal decision-making policies through interactions with an environment. However, traditional RL algorithms suffer from several limitations such as the need for large amounts of data and long-term credit assignment, i.e. the problem of determining which actions actually produce a certain reward. Recently, Transformers have shown their capacity to address these constraints in this area of learning in an offline setting. This paper proposes a framework that uses Transformers to enhance the training of online off-policy RL agents and address the challenges described above through self-attention. The proposal introduces a hybrid agent with a mixed policy that combines an online off-policy agent with an offline Transformer agent using the Decision Transformer architecture. By sequentially exchanging the experience replay buffer between the agents, the agent's learning training efficiency is improved in the first iterations and so is the training of Transformer-based RL agents in situations with limited data availability or unknown environments.
引用
收藏
页数:19
相关论文
共 50 条
  • [21] Successive Convex Approximation Based Off-Policy Optimization for Constrained Reinforcement Learning
    Tian, Chang
    Liu, An
    Huang, Guan
    Luo, Wu
    IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2022, 70 : 1609 - 1624
  • [22] Off-policy evaluation for tabular reinforcement learning with synthetic trajectories
    Wang, Weiwei
    Li, Yuqiang
    Wu, Xianyi
    STATISTICS AND COMPUTING, 2024, 34 (01)
  • [23] Off-Policy Conservative Distributional Reinforcement Learning With Safety Constraints
    Zhang, Hengrui
    Lin, Youfang
    Han, Sheng
    Wang, Shuo
    Lv, Kai
    IEEE TRANSACTIONS ON SYSTEMS MAN CYBERNETICS-SYSTEMS, 2025, 55 (03): : 2033 - 2045
  • [24] HiER: Highlight Experience Replay for Boosting Off-Policy Reinforcement Learning Agents
    Horvath, Daniel
    Martin, Jesus Bujalance
    Erdos, Ferenc Gabor
    Istenes, Zoltan
    Moutarde, Fabien
    IEEE ACCESS, 2024, 12 : 100102 - 100119
  • [25] Off-Policy Prediction Learning: An Empirical Study of Online Algorithms
    Ghiassian, Sina
    Rafiee, Banafsheh
    Sutton, Richard S.
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2024, : 1 - 15
  • [26] A General Technique to Combine Off-Policy Reinforcement Learning Algorithms with Satellite Attitude Control
    Zhang, Jian
    Wu, Fengge
    Zhao, Junsuo
    Xu, Fanjiang
    PROCEEDINGS OF 2019 CHINESE INTELLIGENT AUTOMATION CONFERENCE, 2020, 586 : 709 - 719
  • [27] Optimal Control of Iron-Removal Systems Based on Off-Policy Reinforcement Learning
    Chen, Ning
    Luo, Shuhan
    Dai, Jiayang
    Luo, Biao
    Gui, Weihua
    IEEE ACCESS, 2020, 8 (08): : 149730 - 149740
  • [28] Enhanced Strategies for Off-Policy Reinforcement Learning Algorithms in HVAC Control
    Chen, Zhe
    Jia, Qingshan
    2024 14TH ASIAN CONTROL CONFERENCE, ASCC 2024, 2024, : 1691 - 1696
  • [29] Model-free off-policy reinforcement learning in continuous environment
    Wawrzynski, P
    Pacut, A
    2004 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-4, PROCEEDINGS, 2004, : 1091 - 1096
  • [30] Research on Experience Replay of Off-policy Deep Reinforcement Learning: A Review
    Hu Z.-J.
    Gao X.-G.
    Wan K.-F.
    Zhang L.-T.
    Wang Q.-L.
    Neretin E.
    Zidonghua Xuebao/Acta Automatica Sinica, 2023, 49 (11): : 2237 - 2256