Levy noise promotes cooperation in the prisoner's dilemma game with reinforcement learning

被引：62

作者：

Wang, Lu ^{[1
,2
]}

Jia, Danyang ^{[1
,2
]}

Zhang, Long ^{[2
,3
]}

Zhu, Peican ^{[2
,3
]}

Perc, Matjaz ^{[4
,5
,6
,7
]}

Shi, Lei ^{[8
]}

Wang, Zhen ^{[9
,10
]}

机构：

[1] Northwestern Polytech Univ, Sch Mech Engn, Xian 710072, Peoples R China

[2] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian 710072, Peoples R China

[3] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China

[4] Univ Maribor, Fac Nat Sci & Math, Maribor, Slovenia

[5] China Med Univ, China Med Univ Hosp, Dept Med Res, Taichung 404332, Taiwan

[6] Complex Sci Hub Vienna, Vienna, Austria

[7] Alma Mater Europaea, Maribor, Slovenia

[8] Yunnan Univ Finance & Econ, Sch Math & Stat, Kunming 650221, Yunnan, Peoples R China

[9] Northwestern Polytech Univ, Sch Mech Engn, Sch Artificial Intelligence Opt & Elect iOPEN, Xian 710072, Peoples R China

[10] Northwestern Polytech Univ, Sch Cybersecur, Xian 710072, Peoples R China

来源：

NONLINEAR DYNAMICS | 2022年 / 108卷 / 02期

基金：

中国国家自然科学基金; 国家重点研发计划;

关键词：

Evolutionary dynamics; Prisoner's dilemma; Cooperation; Self-regarding Q-learning; Levy noise; TIT-FOR-TAT; PUNISHMENT;

D O I：

10.1007/s11071-022-07289-7

中图分类号：

TH [机械、仪表工业];

学科分类号：

0802 ;

摘要：

Uncertainties are ubiquitous in everyday life, and it is thus important to explore their effects on the evolution of cooperation. In this paper, the prisoner's dilemma game with reinforcement learning subject to Levy noise is studied. Specifically, diverse fluctuations mimicked by Levy distributed noise are reflected in the payoff matrix of each player. At the same time, the self-regarding Q-learning algorithm is considered as the strategy update rule to learn the behavior that achieves the highest payoff. The results show that not only does Levy noise promote the evolution of cooperation with reinforcement learning, it does so comparatively better than Gaussian noise. We explain this with the iterative updating pattern of the self-regarding Q-learning algorithm, which has an accumulative effect on the noise entering the payoff matrix. It turns out that under Levy noise, the Q-value of cooperative behavior becomes significantly larger than that of defective behavior when the current strategy is defection, which ultimately leads to the prevalence of cooperation, while this is absent with Gaussian noise or without noise. This research thus unveils a particular positive role of Levy noise in the evolutionary dynamics of social dilemmas.

引用

页码：1837 / 1845

页数：9

共 54 条

[51] Resolution of the Stochastic Strategy Spatial Prisoner's Dilemma by Means of Particle Swarm Optimization [J].

Zhang, Jianlei ;

Zhang, Chunyan ;

Chu, Tianguang ;

Perc, Matjaz .

PLOS ONE, 2011, 6 (07)

[52] Oscillatory evolution of collective behavior in evolutionary games played with reinforcement learning [J].

Zhang, Si-Ping ;

Zhang, Ji-Qiang ;

Chen, Li ;

Liu, Xu-Dong .

NONLINEAR DYNAMICS, 2020, 99 (04) :3301-3312

[53] Collective behavior of artificial intelligence population: transition from optimization to game [J].

Zhang, Si-Ping ;

Zhang, Ji-Qiang ;

Huang, Zi-Gang ;

Guo, Bing-Hui ;

Wu, Zhi-Xi ;

Wang, Jue .

NONLINEAR DYNAMICS, 2019, 95 (02) :1627-1637

[54] The role of punishment in the spatial public goods game [J].

Zhu, Peican ;

Guo, Hao ;

Zhang, Hailun ;

Han, Ying ;

Wang, Zhen ;

Chu, Chen .

NONLINEAR DYNAMICS, 2020, 102 (04) :2959-2968

← 1 2 3 4 5 6 →