An Adaptive Strategy via Reinforcement Learning for the Prisoner's Dilemma Game

被引:1
作者
Lei Xue [1 ]
Changyin Sun [2 ,1 ]
Donald Wunsch [2 ,3 ]
Yingjiang Zhou [4 ]
Fang Yu [5 ]
机构
[1] Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education,School of Automation, Southeast University
[2] IEEE
[3] Department of Electrical and Computer Engineering,Missouri University of Science and Technology
[4] College of Automation, Nanjing University of Posts and Telecommunications
[5] Institute of Logistics Science and Engineering, Shanghai Maritime University
基金
中国博士后科学基金;
关键词
Complex network; prisoner’s dilemma; reinforcement learning; temporal differences learning;
D O I
暂无
中图分类号
O157.5 [图论];
学科分类号
070104 ;
摘要
The iterated prisoner’s dilemma(IPD) is an ideal model for analyzing interactions between agents in complex networks. It has attracted wide interest in the development of novel strategies since the success of tit-for-tat in Axelrod’s tournament. This paper studies a new adaptive strategy of IPD in different complex networks, where agents can learn and adapt their strategies through reinforcement learning method. A temporal difference learning method is applied for designing the adaptive strategy to optimize the decision making process of the agents. Previous studies indicated that mutual cooperation is hard to emerge in the IPD. Therefore, three examples which based on square lattice network and scale-free network are provided to show two features of the adaptive strategy. First, the mutual cooperation can be achieved by the group with adaptive agents under scale-free network, and once evolution has converged mutual cooperation, it is unlikely to shift. Secondly, the adaptive strategy can earn a better payoff compared with other strategies in the square network. The analytical properties are discussed for verifying evolutionary stability of the adaptive strategy.
引用
收藏
页码:301 / 310
页数:10
相关论文
共 8 条
[1]  
Evolutionary behavior of generalized zero-determinant strategies in iterated prisoner’s dilemma[J] . Jie Liu,Y. Li,C. Xu,P.M. Hui. Physica A: Statistical Mechanics and its Applicat . 2015
[2]  
Uncovering overlapping community structures by the key bi-community and intimate degree in bipartite networks[J] . Yaozu Cui,Xingyuan Wang. Physica A: Statistical Mechanics and its Applications . 2014
[3]  
Community structure inhibits cooperation in the spatial prisoner’s dilemma[J] . Jianshe Wu,Yanqiao Hou,Licheng Jiao,Huijie Li. Physica A: Statistical Mechanics and its Applicat . 2014
[4]  
Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality[J] . Kyriakos G. Vamvoudakis,Frank L. Lewis,Greg R. Hudas. Automatica . 2012 (8)
[5]  
Invasion and expansion of cooperators in lattice populations: Prisoner's dilemma vs. snowdrift games[J] . Feng Fu,Martin A. Nowak,Christoph Hauert. Journal of Theoretical Biology . 2010 (3)
[6]  
Multi-agent team cooperation: A game theory approach[J] . E. Semsar-Kazerooni,K. Khorasani. Automatica . 2009 (10)
[7]  
Evolutionary games on graphs[J] . Physics Reports . 2007 (4)
[8]   A new route to the evolution of cooperation [J].
Santos, FC ;
Pacheco, JM .
JOURNAL OF EVOLUTIONARY BIOLOGY, 2006, 19 (03) :726-733