Causal Reinforcement Learning in Iterated Prisoner's Dilemma

被引:3
|
作者
Kazemi, Yosra [1 ]
Chanel, Caroline P. C. [2 ]
Givigi, Sidney [1 ]
机构
[1] Queens Univ, Sch Comp, Kingston, ON K7L 2N8, Canada
[2] Univ Toulouse, Inst Super Aeronaut & Espace ISAE SUPAERO, Dept Design & Control Aerosp Vehicles, F-31013 Toulouse, France
关键词
~Causal inference; game theory; prisoner's dilemma (PD); reinforcement learning (RL); social dilemma;
D O I
10.1109/TCSS.2023.3289470
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The iterated prisoner's dilemma (IPD) is an archetypal paradigm to model cooperation and has guided studies on social dilemmas. In this work, we develop a causal reinforcement learning (CRL) strategy in a PD game. An agent is designed to have an explicit causal representation of other agents playing strategies from the Axelrod tournament. The collection of policies is assembled in an ensemble RL to choose the best strategy. The agent is then tested against selected Axelrod tournament strategies as well as an adaptive agent trained using traditional RL. Results show that our agent is able to play against all other players and score higher while being adaptive in situations where the strategy of the other players' changes. Furthermore, the decision taken by the agent can be explained in terms of the causal representation of the interactions. Based on the decision made by the agent, a human observer can understand the chosen strategy.
引用
收藏
页码:2523 / 2534
页数:12
相关论文
共 50 条
  • [41] Playing prisoner's dilemma with quantum rules
    Du, Jiangfeng
    Xu, Xiaodong
    Li, Hui
    Zhou, Xianyi
    Han, Rongdian
    FLUCTUATION AND NOISE LETTERS, 2002, 2 (04): : R189 - R203
  • [42] A Survey on Causal Reinforcement Learning
    Zeng, Yan
    Cai, Ruichu
    Sun, Fuchun
    Huang, Libo
    Hao, Zhifeng
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (04) : 5942 - 5962
  • [43] Evolutionary dynamics of the prisoner's dilemma with expellers
    Wang, Xiaofeng
    Zhang, Guofeng
    Kong, Weijian
    JOURNAL OF PHYSICS COMMUNICATIONS, 2019, 3 (01):
  • [44] Evolutionary prisoner's dilemma in random graphs
    Durán, O
    Mulet, R
    PHYSICA D-NONLINEAR PHENOMENA, 2005, 208 (3-4) : 257 - 265
  • [45] A Hybrid Application for Prisoner's Dilemma Game
    Carneiro, Yasmin Carolina
    Carpanezi dos Santos, Joao Pedro
    Belgamo, Anderson
    Ferreira, Andre Luiz
    Faleiros, Pedro Bordini
    2022 XVII LATIN AMERICAN CONFERENCE ON LEARNING TECHNOLOGIES (LACLO 2022), 2022, : 87 - 94
  • [46] Causal Campbell-Goodhart's Law and Reinforcement Learning
    Ashton, Hal
    ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 67 - 73
  • [47] On Learning and Co-Learning Effective Strategies in Iterated Travelers' Dilemma
    Tosic, Predrag T.
    2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2016), 2016, : 674 - 677
  • [48] Prisoner's dilemma from a moral point of view
    Tilley, JJ
    THEORY AND DECISION, 1996, 41 (02) : 187 - 193
  • [49] Effects of mobility in a population of prisoner's dilemma players
    Meloni, S.
    Buscarino, A.
    Fortuna, L.
    Frasca, M.
    Gomez-Gardenes, J.
    Latora, V.
    Moreno, Y.
    PHYSICAL REVIEW E, 2009, 79 (06):
  • [50] Reaching pareto-optimality in prisoner’s dilemma using conditional joint action learning
    Dipyaman Banerjee
    Sandip Sen
    Autonomous Agents and Multi-Agent Systems, 2007, 15 : 91 - 108