A Hybrid Online Off-Policy Reinforcement Learning Agent Framework Supported by Transformers

被引:2
|
作者
Villarrubia-Martin, Enrique Adrian [1 ]
Rodriguez-Benitez, Luis [1 ]
Jimenez-Linares, Luis [1 ]
Munoz-Valero, David [2 ]
Liu, Jun [3 ]
机构
[1] Univ Castilla La Mancha, Dept Technol & Informat Syst, Paseo Univ, Ciudad Real 413005, Spain
[2] Univ Castilla La Mancha, Dept Technol & Informat Syst, Ave Carlos III,s-n, Toledo 45004, Spain
[3] Univ Ulster, Sch Comp, Belfast, North Ireland
关键词
Reinforcement learning; self-attention; off-policy; Transformer; experience replay; LEVEL;
D O I
10.1142/S012906572350065X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) is a powerful technique that allows agents to learn optimal decision-making policies through interactions with an environment. However, traditional RL algorithms suffer from several limitations such as the need for large amounts of data and long-term credit assignment, i.e. the problem of determining which actions actually produce a certain reward. Recently, Transformers have shown their capacity to address these constraints in this area of learning in an offline setting. This paper proposes a framework that uses Transformers to enhance the training of online off-policy RL agents and address the challenges described above through self-attention. The proposal introduces a hybrid agent with a mixed policy that combines an online off-policy agent with an offline Transformer agent using the Decision Transformer architecture. By sequentially exchanging the experience replay buffer between the agents, the agent's learning training efficiency is improved in the first iterations and so is the training of Transformer-based RL agents in situations with limited data availability or unknown environments.
引用
收藏
页数:19
相关论文
共 50 条
  • [31] TBQ(σ): Improving Efficiency of Trace Utilization for Off-Policy Reinforcement Learning
    Shi, Longxiang
    Li, Shijian
    Cao, Longbing
    Yang, Long
    Pan, Gang
    AAMAS '19: PROCEEDINGS OF THE 18TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2019, : 1025 - 1032
  • [32] Fuzzy state aggregation and off-policy reinforcement learning for stochastic environments
    Wardell, Dean C.
    Peterson, Gilbert L.
    PROCEEDINGS OF THE EIGHTH IASTED INTERNATIONAL CONFERENCE ON CONTROL AND APPLICATIONS, 2006, : 133 - +
  • [33] Off-policy asymptotic and adaptive maximum entropy deep reinforcement learning
    Zhang, Huihui
    Han, Xu
    INTERNATIONAL JOURNAL OF MACHINE LEARNING AND CYBERNETICS, 2025, 16 (04) : 2417 - 2429
  • [34] Optimal robust online tracking control for space manipulator in task space using off-policy reinforcement learning
    Zhuang, Hongji
    Zhou, Hang
    Shen, Qiang
    Wu, Shufan
    Razoumny, Vladimir Yu.
    Razoumny, Yury N.
    AEROSPACE SCIENCE AND TECHNOLOGY, 2024, 153
  • [35] High-Value Prioritized Experience Replay for Off-policy Reinforcement Learning
    Cao, Xi
    Wan, Huaiyu
    Lin, Youfang
    Han, Sheng
    2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1510 - 1514
  • [36] A Generalized Projected Bellman Error for Off-policy Value Estimation in Reinforcement Learning
    Patterson, Andrew
    White, Adam
    White, Martha
    JOURNAL OF MACHINE LEARNING RESEARCH, 2022, 23
  • [37] Hyperparameter Tuning of an Off-Policy Reinforcement Learning Algorithm for H∞ Tracking Control
    Farahmandi, Alireza
    Reitz, Brian
    Debord, Mark
    Philbrick, Douglas
    Estabridis, Katia
    Hewer, Gary
    LEARNING FOR DYNAMICS AND CONTROL CONFERENCE, VOL 211, 2023, 211
  • [38] Benchmarking Off-Policy Deep Reinforcement Learning Algorithms for UAV Path Planning
    Garg, Shaswat
    Masnavi, Houman
    Fidan, Baris
    Janabi-Sharifi, Farrokh
    Mantegh, Iraj
    2024 INTERNATIONAL CONFERENCE ON UNMANNED AIRCRAFT SYSTEMS, ICUAS, 2024, : 317 - 323
  • [39] Event-Driven Off-Policy Reinforcement Learning for Control of Interconnected Systems
    Narayanan, Vignesh
    Modares, Hamidreza
    Jagannathan, Sarangapani
    Lewis, Frank L.
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (03) : 1936 - 1946
  • [40] Z-Score Experience Replay in Off-Policy Deep Reinforcement Learning
    Yang, Yana
    Xi, Meng
    Dai, Huiao
    Wen, Jiabao
    Yang, Jiachen
    SENSORS, 2024, 24 (23)