Extending Q-learning to continuous and mixed strategy games based on spatial reciprocity

被引:13
作者
Wang, Lu [1 ,2 ]
Zhang, Long [2 ,3 ]
Liu, Yang [2 ]
Wang, Zhen [1 ,2 ,4 ]
机构
[1] Northwestern Polytech Univ, Sch Mech Engn, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Artificial Intelligence Opt & Elect iOPEN, Xian 710072, Peoples R China
[3] Northwestern Polytech Univ, Sch Comp Sci, Xian 710072, Peoples R China
[4] Northwestern Polytech Univ, Sch Cybersecur, Xian 710072, Peoples R China
来源
PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES | 2023年 / 479卷 / 2274期
基金
中国国家自然科学基金;
关键词
Q-learning; spatial reciprocity; continuous strategy; mixed strategy; PRISONERS-DILEMMA; SWARM INTELLIGENCE; COOPERATION; DISCRETE; PUNISHMENT; EVOLUTION; BEHAVIOR; SYSTEMS;
D O I
10.1098/rspa.2022.0667
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
The discrete strategy game, in which agents can only choose cooperation or defection, has received lots of attention. However, this hypothesis seems implausible in the real world, where choices may be continuous or mixed. Furthermore, when applying Q-learning to continuous or mixed strategy games, one of the challenges is that the learning space grows drastically as the number of states and actions rises. So, in this article, we redesign the Q-learning method by considering the spatial reciprocity, in which agents simply interact with their four neighbours to get the reward and learn the action by taking neighbours' strategy into account. As a result, the learning state and action space is transformed into a 5 x 5 table that stores the state and action of the focal agent and its four neighbours, avoiding the curse of dimensionality caused by a continuous or mixed strategy game. The numerical simulation results reveal the striking differences between the three classes of games. In detail, the discrete strategy game is more vulnerable to the setting of relevant parameters, whereas the other two strategy games are relatively stable. At the same time, in terms of promoting cooperation, a mixed strategy game is always better than a continuous one.
引用
收藏
页数:12
相关论文
共 62 条
[1]   Evolutionary dynamics on any population structure [J].
Allen, Benjamin ;
Lippner, Gabor ;
Chen, Yu-Ting ;
Fotouhi, Babak ;
Momeni, Naghmeh ;
Yau, Shing-Tung ;
Nowak, Martin A. .
NATURE, 2017, 544 (7649) :227-+
[2]   Reward, Punishment, and Cooperation: A Meta-Analysis [J].
Balliet, Daniel ;
Mulder, Laetitia B. ;
Van Lange, Paul A. M. .
PSYCHOLOGICAL BULLETIN, 2011, 137 (04) :594-615
[3]   Promotion of cooperation based on swarm intelligence in spatial public goods games [J].
Chen, Ya-Shan ;
Yang, Han-Xin ;
Guo, Wen-Zhong ;
Liu, Geng-Geng .
APPLIED MATHEMATICS AND COMPUTATION, 2018, 320 :614-620
[4]   Complex evolutionary dynamics due to punishment and free space in ecological multi-games [J].
Chowdhury, Sayantan Nag ;
Kundu, Srilena ;
Perc, Matjaz ;
Ghosh, Dibakar .
PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2021, 477 (2252)
[5]   Memory-based stag hunt game on regular lattices [J].
Dong, Yukun ;
Xu, Hedong ;
Fan, Suohai .
PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2019, 519 :247-255
[6]   Reinforcement Learning Explains Conditional Cooperation and Its Moody Cousin [J].
Ezaki, Takahiro ;
Horita, Yutaka ;
Takezawa, Masanori ;
Masuda, Naoki .
PLOS COMPUTATIONAL BIOLOGY, 2016, 12 (07)
[7]   Synergistic third-party rewarding and punishment in the public goods game [J].
Fang, Yinhai ;
Benko, Tina P. ;
Perc, Matjaz ;
Xu, Haiyan ;
Tan, Qingmei .
PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2019, 475 (2227)
[8]   Altruistic punishment and the origin of cooperation [J].
Fowler, JH .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (19) :7047-7049
[9]   Reputation-based partner choice promotes cooperation in social networks [J].
Fu, Feng ;
Hauert, Christoph ;
Nowak, Martin A. ;
Wang, Long .
PHYSICAL REVIEW E, 2008, 78 (02)
[10]   The effects of reputational and social knowledge on cooperation [J].
Gallo, Edoardo ;
Yan, Chang .
PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2015, 112 (12) :3647-3652