Levy noise promotes cooperation in the prisoner's dilemma game with reinforcement learning

被引:62
作者
Wang, Lu [1 ,2 ]
Jia, Danyang [1 ,2 ]
Zhang, Long [2 ,3 ]
Zhu, Peican [2 ,3 ]
Perc, Matjaz [4 ,5 ,6 ,7 ]
Shi, Lei [8 ]
Wang, Zhen [9 ,10 ]
机构
[1] Northwestern Polytech Univ, Sch Mech Engn, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian 710072, Peoples R China
[3] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
[4] Univ Maribor, Fac Nat Sci & Math, Maribor, Slovenia
[5] China Med Univ, China Med Univ Hosp, Dept Med Res, Taichung 404332, Taiwan
[6] Complex Sci Hub Vienna, Vienna, Austria
[7] Alma Mater Europaea, Maribor, Slovenia
[8] Yunnan Univ Finance & Econ, Sch Math & Stat, Kunming 650221, Yunnan, Peoples R China
[9] Northwestern Polytech Univ, Sch Mech Engn, Sch Artificial Intelligence Opt & Elect iOPEN, Xian 710072, Peoples R China
[10] Northwestern Polytech Univ, Sch Cybersecur, Xian 710072, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
Evolutionary dynamics; Prisoner's dilemma; Cooperation; Self-regarding Q-learning; Levy noise; TIT-FOR-TAT; PUNISHMENT;
D O I
10.1007/s11071-022-07289-7
中图分类号
TH [机械、仪表工业];
学科分类号
0802 ;
摘要
Uncertainties are ubiquitous in everyday life, and it is thus important to explore their effects on the evolution of cooperation. In this paper, the prisoner's dilemma game with reinforcement learning subject to Levy noise is studied. Specifically, diverse fluctuations mimicked by Levy distributed noise are reflected in the payoff matrix of each player. At the same time, the self-regarding Q-learning algorithm is considered as the strategy update rule to learn the behavior that achieves the highest payoff. The results show that not only does Levy noise promote the evolution of cooperation with reinforcement learning, it does so comparatively better than Gaussian noise. We explain this with the iterative updating pattern of the self-regarding Q-learning algorithm, which has an accumulative effect on the noise entering the payoff matrix. It turns out that under Levy noise, the Q-value of cooperative behavior becomes significantly larger than that of defective behavior when the current strategy is defection, which ultimately leads to the prevalence of cooperation, while this is absent with Gaussian noise or without noise. This research thus unveils a particular positive role of Levy noise in the evolutionary dynamics of social dilemmas.
引用
收藏
页码:1837 / 1845
页数:9
相关论文
共 54 条
[11]   Altruistic punishment and the origin of cooperation [J].
Fowler, JH .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (19) :7047-7049
[12]   Evolutionary prisoner's dilemma on heterogeneous Newman-Watts small-world network [J].
Fu, F. ;
Liu, L.-H. ;
Wang, L. .
EUROPEAN PHYSICAL JOURNAL B, 2007, 56 (04) :367-372
[13]   Coevolutionary dynamics of opinions and networks: From diversity to uniformity [J].
Fu, Feng ;
Wang, Long .
PHYSICAL REVIEW E, 2008, 78 (01)
[14]   The effects of reputational and social knowledge on cooperation [J].
Gallo, Edoardo ;
Yan, Chang .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (12) :3647-3652
[15]   The rise and fall of cooperation through reputation and group polarization [J].
Gross, Jorg ;
De Dreu, Carsten K. W. .
NATURE COMMUNICATIONS, 2019, 10 (1)
[16]   The dynamics of cooperation in asymmetric sub-populations [J].
Guo, H. ;
Li, X. ;
Hu, K. ;
Dai, X. ;
Jia, D. ;
Boccaletti, S. ;
Perc, M. ;
Wang, Z. .
NEW JOURNAL OF PHYSICS, 2020, 22 (08)
[17]   Investing the effect of age and cooperation in spatial multigame [J].
Han, Ying ;
Song, Zhao ;
Sun, Jialong ;
Ma, Jiezhong ;
Guo, Yangming ;
Zhu, Peican .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2020, 541
[18]   Memory-n strategies of direct reciprocity [J].
Hilbe, Christian ;
Martinez-Vaquero, Luis A. ;
Chatterjee, Krishnendu ;
Nowak, Martin A. .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2017, 114 (18) :4715-4720
[19]   Empty nodes affect conditional cooperation under reinforcement learning [J].
Jia, Danyang ;
Li, Tong ;
Zhao, Yang ;
Zhang, Xiaoqin ;
Wang, Zhen .
APPLIED MATHEMATICS AND COMPUTATION, 2022, 413
[20]   Local and global stimuli in reinforcement learning [J].
Jia, Danyang ;
Guo, Hao ;
Song, Zhao ;
Shi, Lei ;
Deng, Xinyang ;
Perc, Matjaz ;
Wang, Zhen .
NEW JOURNAL OF PHYSICS, 2021, 23 (08)