A Hybrid Online Off-Policy Reinforcement Learning Agent Framework Supported by Transformers

被引:2
|
作者
Villarrubia-Martin, Enrique Adrian [1 ]
Rodriguez-Benitez, Luis [1 ]
Jimenez-Linares, Luis [1 ]
Munoz-Valero, David [2 ]
Liu, Jun [3 ]
机构
[1] Univ Castilla La Mancha, Dept Technol & Informat Syst, Paseo Univ, Ciudad Real 413005, Spain
[2] Univ Castilla La Mancha, Dept Technol & Informat Syst, Ave Carlos III,s-n, Toledo 45004, Spain
[3] Univ Ulster, Sch Comp, Belfast, North Ireland
关键词
Reinforcement learning; self-attention; off-policy; Transformer; experience replay; LEVEL;
D O I
10.1142/S012906572350065X
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) is a powerful technique that allows agents to learn optimal decision-making policies through interactions with an environment. However, traditional RL algorithms suffer from several limitations such as the need for large amounts of data and long-term credit assignment, i.e. the problem of determining which actions actually produce a certain reward. Recently, Transformers have shown their capacity to address these constraints in this area of learning in an offline setting. This paper proposes a framework that uses Transformers to enhance the training of online off-policy RL agents and address the challenges described above through self-attention. The proposal introduces a hybrid agent with a mixed policy that combines an online off-policy agent with an offline Transformer agent using the Decision Transformer architecture. By sequentially exchanging the experience replay buffer between the agents, the agent's learning training efficiency is improved in the first iterations and so is the training of Transformer-based RL agents in situations with limited data availability or unknown environments.
引用
收藏
页数:19
相关论文
共 50 条
  • [11] Optimal Control for Multi-agent Systems Using Off-Policy Reinforcement Learning
    Wang, Hao
    Chen, Zhiru
    Wang, Jun
    Lu, Lijun
    Li, Mingzhe
    2022 4TH INTERNATIONAL CONFERENCE ON CONTROL AND ROBOTICS, ICCR, 2022, : 135 - 140
  • [12] Research on Off-Policy Evaluation in Reinforcement Learning: A Survey
    Wang S.-R.
    Niu W.-J.
    Tong E.-D.
    Chen T.
    Li H.
    Tian Y.-Z.
    Liu J.-Q.
    Han Z.
    Li Y.-D.
    Jisuanji Xuebao/Chinese Journal of Computers, 2022, 45 (09): : 1926 - 1945
  • [13] Off-Policy Reinforcement Learning for H∞ Control Design
    Luo, Biao
    Wu, Huai-Ning
    Huang, Tingwen
    IEEE TRANSACTIONS ON CYBERNETICS, 2015, 45 (01) : 65 - 76
  • [14] Re-attentive experience replay in off-policy reinforcement learning
    Wei Wei
    Da Wang
    Lin Li
    Jiye Liang
    Machine Learning, 2024, 113 : 2327 - 2349
  • [15] Re-attentive experience replay in off-policy reinforcement learning
    Wei, Wei
    Wang, Da
    Li, Lin
    Liang, Jiye
    MACHINE LEARNING, 2024, 113 (05) : 2327 - 2349
  • [16] A MULTIAGENT REINFORCEMENT LEARNING FRAMEWORK FOR OFF-POLICY EVALUATION IN TWO-SIDED MARKETS
    Shi, Chengchun
    Wan, Runzhe
    Song, Ge
    Luo, Shikai
    Zhu, Hongtu
    Song, Rui
    ANNALS OF APPLIED STATISTICS, 2023, 17 (04) : 2701 - 2722
  • [17] Off-Policy Deep Reinforcement Learning Based on Steffensen Value Iteration
    Cheng, Yuhu
    Chen, Lin
    Chen, C. L. Philip
    Wang, Xuesong
    IEEE TRANSACTIONS ON COGNITIVE AND DEVELOPMENTAL SYSTEMS, 2021, 13 (04) : 1023 - 1032
  • [18] Safe Off-policy Reinforcement Learning Using Barrier Functions
    Marvi, Zahra
    Kiumarsi, Bahare
    2020 AMERICAN CONTROL CONFERENCE (ACC), 2020, : 2176 - 2181
  • [19] Off-policy evaluation for tabular reinforcement learning with synthetic trajectories
    Weiwei Wang
    Yuqiang Li
    Xianyi Wu
    Statistics and Computing, 2024, 34
  • [20] Off-Policy Reinforcement Learning with Loss Function Weighted by Temporal Difference Error
    Park, Bumgeun
    Kim, Taeyoung
    Moon, Woohyeon
    Nengroo, Sarvar Hussain
    Har, Dongsoo
    ADVANCED INTELLIGENT COMPUTING TECHNOLOGY AND APPLICATIONS, ICIC 2023, PT V, 2023, 14090 : 600 - 613