Causal Reinforcement Learning in Iterated Prisoner's Dilemma

被引：3

作者：

Kazemi, Yosra ^{[1
]}

Chanel, Caroline P. C. ^{[2
]}

Givigi, Sidney ^{[1
]}

机构：

[1] Queens Univ, Sch Comp, Kingston, ON K7L 2N8, Canada

[2] Univ Toulouse, Inst Super Aeronaut & Espace ISAE SUPAERO, Dept Design & Control Aerosp Vehicles, F-31013 Toulouse, France

来源：

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年 / 11卷 / 02期

关键词：

~Causal inference; game theory; prisoner's dilemma (PD); reinforcement learning (RL); social dilemma;

D O I：

10.1109/TCSS.2023.3289470

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The iterated prisoner's dilemma (IPD) is an archetypal paradigm to model cooperation and has guided studies on social dilemmas. In this work, we develop a causal reinforcement learning (CRL) strategy in a PD game. An agent is designed to have an explicit causal representation of other agents playing strategies from the Axelrod tournament. The collection of policies is assembled in an ensemble RL to choose the best strategy. The agent is then tested against selected Axelrod tournament strategies as well as an adaptive agent trained using traditional RL. Results show that our agent is able to play against all other players and score higher while being adaptive in situations where the strategy of the other players' changes. Furthermore, the decision taken by the agent can be explained in terms of the causal representation of the interactions. Based on the decision made by the agent, a human observer can understand the chosen strategy.

引用

页码：2523 / 2534

页数：12

共 50 条

[21] On the coexistence of cooperators, defectors and conditional cooperators in the multiplayer iterated Prisoner's Dilemma
Grujic, Jelena
Cuesta, Jose A.
Sanchez, Angel
JOURNAL OF THEORETICAL BIOLOGY, 2012, 300 : 299 - 308
[22] The Effect of Memory Size on the Evolutionary Stability of Strategies in Iterated Prisoner's Dilemma
Li, Jiawei
Kendall, Graham
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2014, 18 (06) : 819 - 826
[23] Oyun: A New, Free Program for Iterated Prisoner's Dilemma Tournaments in the Classroom
Charles H. Pence
Lara Buchak
Evolution: Education and Outreach, 2012, 5 (3) : 467 - 476
[24] Stage-game payoff values alter the equilibria of Iterated Prisoner's Dilemma
Torii, Takuma
2015 INTERNATIONAL CONFERENCE ON INTELLIGENT INFORMATICS AND BIOMEDICAL SCIENCES (ICIIBMS), 2015, : 233 - 238
[25] Perfect reciprocity is the only evolutionarily stable strategy in the continuous iterated prisoner's dilemma
Andre, Jean-Baptiste
Day, Troy
JOURNAL OF THEORETICAL BIOLOGY, 2007, 247 (01) : 11 - 22
[26] Measuring social anxiety related interpersonal constraint with the flexible iterated prisoner's dilemma
Rodebaugh, Thomas L.
Klein, Sarah R.
Yarkoni, Tal
Langer, Julia K.
JOURNAL OF ANXIETY DISORDERS, 2011, 25 (03) : 427 - 436
[27] An incentive compatible ZD strategy-based data sharing model for federated learning: A perspective of iterated prisoner's dilemma
Jie, Yingmo
Liu, Charles Zhechao
Choo, Kim-Kwang Raymond
Guo, Cheng
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2024, 315 (02) : 764 - 776
[28] Computation and the Prisoner's Dilemma
Wooldridge, Michael
IEEE INTELLIGENT SYSTEMS, 2012, 27 (02) : 75 - 80
[29] Using an iterated prisoner's dilemma with exit option to study alliance behavior: Results of a tournament and simulation
Phelan S.E.
Arend R.J.
Seale D.A.
Computational & Mathematical Organization Theory, 2005, 11 (4) : 339 - 356
[30] Iterated Prisoner's Dilemma among mobile agents performing 2D random walk
Hizak, Jurica
CROATIAN OPERATIONAL RESEARCH REVIEW, 2021, 12 (02) : 161 - 174

← 1 2 3 4 5 →