Causal Reinforcement Learning in Iterated Prisoner's Dilemma

被引：3

作者：

Kazemi, Yosra ^{[1
]}

Chanel, Caroline P. C. ^{[2
]}

Givigi, Sidney ^{[1
]}

机构：

[1] Queens Univ, Sch Comp, Kingston, ON K7L 2N8, Canada

[2] Univ Toulouse, Inst Super Aeronaut & Espace ISAE SUPAERO, Dept Design & Control Aerosp Vehicles, F-31013 Toulouse, France

来源：

IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS | 2024年 / 11卷 / 02期

关键词：

~Causal inference; game theory; prisoner's dilemma (PD); reinforcement learning (RL); social dilemma;

D O I：

10.1109/TCSS.2023.3289470

中图分类号：

TP3 [计算技术、计算机技术];

学科分类号：

0812 ;

摘要：

The iterated prisoner's dilemma (IPD) is an archetypal paradigm to model cooperation and has guided studies on social dilemmas. In this work, we develop a causal reinforcement learning (CRL) strategy in a PD game. An agent is designed to have an explicit causal representation of other agents playing strategies from the Axelrod tournament. The collection of policies is assembled in an ensemble RL to choose the best strategy. The agent is then tested against selected Axelrod tournament strategies as well as an adaptive agent trained using traditional RL. Results show that our agent is able to play against all other players and score higher while being adaptive in situations where the strategy of the other players' changes. Furthermore, the decision taken by the agent can be explained in terms of the causal representation of the interactions. Based on the decision made by the agent, a human observer can understand the chosen strategy.

引用

页码：2523 / 2534

页数：12

共 50 条

[31] Evolving learning rules and emergence of cooperation in spatial prisoner's dilemma
Moyano, Luis G.
Sanchez, Angel
JOURNAL OF THEORETICAL BIOLOGY, 2009, 259 (01) : 84 - 95
[32] Experience the Prisoner's Dilemma: a game-based learning tool
Lorente, Pablo Jose
Pereda, Maria
DIRECCION Y ORGANIZACION, 2024, 83 : 18 - 27
[33] THE PRISONER'S DILEMMA: AN ANARCHIST READING
Rempel, Martin
EN LETRA, 2016, (06): : 67 - 93
[34] A fuzzy approach to the prisoner's dilemma
Borges, PSS
Pacheco, RCS
Barcia, RM
Khator, SK
BIOSYSTEMS, 1997, 41 (02) : 127 - 137
[35] The undecidability of the spatialized prisoner's dilemma
Patrick Grim
Theory and Decision, 1997, 42 : 53 - 80
[36] The undecidability of the spatialized prisoner's dilemma
Grim, P
THEORY AND DECISION, 1997, 42 (01) : 53 - 80
[37] Prisoner's Dilemma Game on Network
Ono, Masahiro
Ishizuka, Mitsuru
MULTI-AGENT SYSTEMS FOR SOCIETY, 2009, 4078 : 33 - 44
[38] Towards Circular and Asymmetric Cooperation in a Multi-player Graph-based Iterated Prisoner's Dilemma
Le Gleau, Tangui
Marjou, Xavier
Lemlouma, Tayeb
Radier, Benoit
ICAART: PROCEEDINGS OF THE 14TH INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE - VOL 2, 2022, : 293 - 303
[39] The Prisoner's Dilemma in Access Control
He, Jing-sha
Zhang, Yi-xuan
Zhou, Shi-yi
Liu, Ruo-hong
INTERNATIONAL CONFERENCE ON COMPUTER, NETWORK SECURITY AND COMMUNICATION ENGINEERING (CNSCE 2014), 2014, : 303 - 306
[40] Rejoinder to Kritikos and Bolle: making indenture viable - the extortionary power of pre-commitment in iterated prisoner's dilemma
Holt, G
JOURNAL OF ECONOMIC BEHAVIOR & ORGANIZATION, 2000, 43 (03) : 393 - 395

← 1 2 3 4 5 →