Causal Reinforcement Learning in Iterated Prisoner's Dilemma

被引:3
|
作者
Kazemi, Yosra [1 ]
Chanel, Caroline P. C. [2 ]
Givigi, Sidney [1 ]
机构
[1] Queens Univ, Sch Comp, Kingston, ON K7L 2N8, Canada
[2] Univ Toulouse, Inst Super Aeronaut & Espace ISAE SUPAERO, Dept Design & Control Aerosp Vehicles, F-31013 Toulouse, France
关键词
~Causal inference; game theory; prisoner's dilemma (PD); reinforcement learning (RL); social dilemma;
D O I
10.1109/TCSS.2023.3289470
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The iterated prisoner's dilemma (IPD) is an archetypal paradigm to model cooperation and has guided studies on social dilemmas. In this work, we develop a causal reinforcement learning (CRL) strategy in a PD game. An agent is designed to have an explicit causal representation of other agents playing strategies from the Axelrod tournament. The collection of policies is assembled in an ensemble RL to choose the best strategy. The agent is then tested against selected Axelrod tournament strategies as well as an adaptive agent trained using traditional RL. Results show that our agent is able to play against all other players and score higher while being adaptive in situations where the strategy of the other players' changes. Furthermore, the decision taken by the agent can be explained in terms of the causal representation of the interactions. Based on the decision made by the agent, a human observer can understand the chosen strategy.
引用
收藏
页码:2523 / 2534
页数:12
相关论文
共 50 条
  • [31] Evolving learning rules and emergence of cooperation in spatial prisoner's dilemma
    Moyano, Luis G.
    Sanchez, Angel
    JOURNAL OF THEORETICAL BIOLOGY, 2009, 259 (01) : 84 - 95
  • [32] Experience the Prisoner's Dilemma: a game-based learning tool
    Lorente, Pablo Jose
    Pereda, Maria
    DIRECCION Y ORGANIZACION, 2024, 83 : 18 - 27
  • [33] THE PRISONER'S DILEMMA: AN ANARCHIST READING
    Rempel, Martin
    EN LETRA, 2016, (06): : 67 - 93
  • [34] A fuzzy approach to the prisoner's dilemma
    Borges, PSS
    Pacheco, RCS
    Barcia, RM
    Khator, SK
    BIOSYSTEMS, 1997, 41 (02) : 127 - 137
  • [35] The undecidability of the spatialized prisoner's dilemma
    Patrick Grim
    Theory and Decision, 1997, 42 : 53 - 80
  • [36] The undecidability of the spatialized prisoner's dilemma
    Grim, P
    THEORY AND DECISION, 1997, 42 (01) : 53 - 80
  • [37] Prisoner's Dilemma Game on Network
    Ono, Masahiro
    Ishizuka, Mitsuru
    MULTI-AGENT SYSTEMS FOR SOCIETY, 2009, 4078 : 33 - 44
  • [38] Towards Circular and Asymmetric Cooperation in a Multi-player Graph-based Iterated Prisoner's Dilemma
    Le Gleau, Tangui
    Marjou, Xavier
    Lemlouma, Tayeb
    Radier, Benoit
    ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2022, : 293 - 303
  • [39] The Prisoner's Dilemma in Access Control
    He, Jing-sha
    Zhang, Yi-xuan
    Zhou, Shi-yi
    Liu, Ruo-hong
    INTERNATIONAL CONFERENCE ON COMPUTER, NETWORK SECURITY AND COMMUNICATION ENGINEERING (CNSCE 2014), 2014, : 303 - 306
  • [40] Rejoinder to Kritikos and Bolle: making indenture viable - the extortionary power of pre-commitment in iterated prisoner's dilemma
    Holt, G
    JOURNAL OF ECONOMIC BEHAVIOR & ORGANIZATION, 2000, 43 (03) : 393 - 395