Reinforcement learning with predefined and inferred reward machines in stochastic games

被引:0
作者
Hu, Jueming [1 ]
Paliwal, Yash [1 ]
Kim, Hyohun [1 ]
Wang, Yanze [1 ]
Xu, Zhe [1 ]
机构
[1] Arizona State Univ, Tempe, AZ 85281 USA
关键词
Reinforcement learning; Non-Markovian rewards; Reward machine; Non-cooperative stochastic game;
D O I
10.1016/j.neucom.2024.128170
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper focuses on Multi-Agent Reinforcement Learning (MARL) in non-cooperative stochastic games, particularly addressing the challenge of task completion characterized by non-Markovian reward functions. We employ Reward Machines (RMs) to incorporate high-level task knowledge. Firstly, we introduce Q-learning with R eward M achines for S tochastic G ames (QRM-SG), where RMs are predefined and available to agents. QRM-SG learns each agent's best-response policy at Nash equilibrium by defining the Q-function in augmented state space that integrates the stochastic game and RM states. The Lemke-Howson method is utilized to compute the best-response policies for the stage game defined by the current Q-functions at each time step. Subsequently, we explore a more challenging scenario where RMs are unavailable and propose M ulti-Agent A gent R einforcement learning with C oncurrent H igh-level knowledge inference (MARCH). MARCH uses automata learning to learn RMs iteratively and combines this process with QRM-SG for learning the best-response policies. The RL episodes where the obtained rewards are inconsistent with the rewards from the current RMs trigger the inference of new RMs. We prove QRM-SG and MARCH converge to the best-response policies under certain conditions. Two scenarios are conducted to demonstrate the superior performance of QRM-SG and MARCH compared to baseline methods.
引用
收藏
页数:19
相关论文
共 47 条
  • [1] Ardon L, 2023, Arxiv, DOI arXiv:2303.14061
  • [2] Reward Machines for Vision-Based Robotic Manipulation
    Camacho, Alberto
    Varley, Jacob
    Deng, Andy
    Jain, Deepali
    Iscen, Atil
    Kalashnikov, Dmitry
    [J]. 2021 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA 2021), 2021, : 14284 - 14290
  • [3] Corazza J, 2022, AAAI CONF ARTIF INTE, P6429
  • [4] De Giacomo G, 2020, KR2020: PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PRINCIPLES OF KNOWLEDGE REPRESENTATION AND REASONING, P860
  • [5] De Giacomo G, 2020, AAAI CONF ARTIF INTE, V34, P13659
  • [6] DeNero J, 2010, AAAI CONF ARTIF INTE, P1885
  • [7] Dohmen T., 2022, P INT C AUTOMATED, P574, DOI DOI 10.1609/ICAPS.V32I1.19844
  • [8] Duan Z., 2023, P 2023 INT C AUT AG, P233
  • [9] Measuring the Impact of Memory Replay in Training Pacman Agents using Reinforcement Learning
    Fallas-Moya, Fabian
    Duncan, Jeremiah
    Samuel, Tabitha
    Sadovnik, Amir
    [J]. 2021 XLVII LATIN AMERICAN COMPUTING CONFERENCE (CLEI 2021), 2021,
  • [10] Furelos-Blanco D., 2023, P MACHINE LEARNING R, P10494