Causal Reinforcement Learning in Iterated Prisoner's Dilemma

被引：3

作者：

Kazemi, Yosra ^{[1
]}

Chanel, Caroline P. C. ^{[2
]}

Givigi, Sidney ^{[1
]}

机构：

[1] Queens Univ, Sch Comp, Kingston, ON K7L 2N8, Canada

[2] Univ Toulouse, Inst Super Aeronaut & Espace ISAE SUPAERO, Dept Design & Control Aerosp Vehicles, F-31013 Toulouse, France

来源：

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年 / 11卷 / 02期

关键词：

~Causal inference; game theory; prisoner's dilemma (PD); reinforcement learning (RL); social dilemma;

D O I：

10.1109/TCSS.2023.3289470

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The iterated prisoner's dilemma (IPD) is an archetypal paradigm to model cooperation and has guided studies on social dilemmas. In this work, we develop a causal reinforcement learning (CRL) strategy in a PD game. An agent is designed to have an explicit causal representation of other agents playing strategies from the Axelrod tournament. The collection of policies is assembled in an ensemble RL to choose the best strategy. The agent is then tested against selected Axelrod tournament strategies as well as an adaptive agent trained using traditional RL. Results show that our agent is able to play against all other players and score higher while being adaptive in situations where the strategy of the other players' changes. Furthermore, the decision taken by the agent can be explained in terms of the causal representation of the interactions. Based on the decision made by the agent, a human observer can understand the chosen strategy.

引用

页码：2523 / 2534

页数：12

共 50 条

[41] Playing prisoner's dilemma with quantum rules
Du, Jiangfeng
Xu, Xiaodong
Li, Hui
Zhou, Xianyi
Han, Rongdian
FLUCTUATION AND NOISE LETTERS, 2002, 2 (04): : R189 - R203
[42] A Survey on Causal Reinforcement Learning
Zeng, Yan
Cai, Ruichu
Sun, Fuchun
Huang, Libo
Hao, Zhifeng
IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2025, 36 (04) : 5942 - 5962
[43] Evolutionary dynamics of the prisoner's dilemma with expellers
Wang, Xiaofeng
Zhang, Guofeng
Kong, Weijian
JOURNAL OF PHYSICS COMMUNICATIONS, 2019, 3 (01):
[44] Evolutionary prisoner's dilemma in random graphs
Durán, O
Mulet, R
PHYSICA D-NONLINEAR PHENOMENA, 2005, 208 (3-4) : 257 - 265
[45] A Hybrid Application for Prisoner's Dilemma Game
Carneiro, Yasmin Carolina
Carpanezi dos Santos, Joao Pedro
Belgamo, Anderson
Ferreira, Andre Luiz
Faleiros, Pedro Bordini
2022 XVII LATIN AMERICAN CONFERENCE ON LEARNING TECHNOLOGIES (LACLO 2022), 2022, : 87 - 94
[46] Causal Campbell-Goodhart's Law and Reinforcement Learning
Ashton, Hal
ICAART: PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2021, : 67 - 73
[47] On Learning and Co-Learning Effective Strategies in Iterated Travelers' Dilemma
Tosic, Predrag T.
2016 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE (WI 2016), 2016, : 674 - 677
[48] Prisoner's dilemma from a moral point of view
Tilley, JJ
THEORY AND DECISION, 1996, 41 (02) : 187 - 193
[49] Effects of mobility in a population of prisoner's dilemma players
Meloni, S.
Buscarino, A.
Fortuna, L.
Frasca, M.
Gomez-Gardenes, J.
Latora, V.
Moreno, Y.
PHYSICAL REVIEW E, 2009, 79 (06):
[50] Reaching pareto-optimality in prisoner’s dilemma using conditional joint action learning
Dipyaman Banerjee
Sandip Sen
Autonomous Agents and Multi-Agent Systems, 2007, 15 : 91 - 108

← 1 2 3 4 5 →