Local and global stimuli in reinforcement learning

被引:66
作者
Jia, Danyang [1 ,2 ]
Guo, Hao [1 ,2 ]
Song, Zhao [1 ,2 ]
Shi, Lei [3 ]
Deng, Xinyang [4 ]
Perc, Matjaz [5 ,6 ,7 ]
Wang, Zhen [1 ,2 ]
机构
[1] Northwestern Polytech Univ, Sch Mech Engn, Xian 710072, Peoples R China
[2] Northwestern Polytech Univ, Sch Artificial Intelligence Opt & Elect iOPEN, Xian 710072, Peoples R China
[3] Yunnan Univ Finance & Econ, Sch Stat & Math, Kunming 650221, Yunnan, Peoples R China
[4] Northwestern Polytech Univ, Sch Elect & Informat, Xian 710072, Peoples R China
[5] Univ Maribor, Fac Nat Sci & Math, Koroska Cesta 160, Maribor 2000, Slovenia
[6] China Med Univ, China Med Univ Hosp, Dept Med Res, Taichung, Taiwan
[7] Complex Sci Hub Vienna, Josefstadterstr 39, A-1080 Vienna, Austria
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
reinforcement learning; local and global stimuli; conditional cooperation; moody conditional cooperation; TIT-FOR-TAT; PRISONERS-DILEMMA; INTENTION RECOGNITION; EVOLUTIONARY GAMES; LOSE-SHIFT; WIN-STAY; COOPERATION; DYNAMICS; ALTRUISM; PROMOTES;
D O I
10.1088/1367-2630/ac170a
中图分类号
O4 [物理学];
学科分类号
0702 ;
摘要
In efforts to resolve social dilemmas, reinforcement learning is an alternative to imitation and exploration in evolutionary game theory. While imitation and exploration rely on the performance of neighbors, in reinforcement learning individuals alter their strategies based on their own performance in the past. For example, according to the Bush-Mosteller model of reinforcement learning, an individual's strategy choice is driven by whether the received payoff satisfies a preset aspiration or not. Stimuli also play a key role in reinforcement learning in that they can determine whether a strategy should be kept or not. Here we use the Monte Carlo method to study pattern formation and phase transitions towards cooperation in social dilemmas that are driven by reinforcement learning. We distinguish local and global players according to the source of the stimulus they experience. While global players receive their stimuli from the whole neighborhood, local players focus solely on individual performance. We show that global players play a decisive role in ensuring cooperation, while local players fail in this regard, although both types of players show properties of 'moody cooperators'. In particular, global players evoke stronger conditional cooperation in their neighborhoods based on direct reciprocity, which is rooted in the emerging spatial patterns and stronger interfaces around cooperative clusters.
引用
收藏
页数:11
相关论文
共 61 条
[1]   Evolutionary dynamics of higher-order interactions in social networks [J].
Alvarez-Rodriguez, Unai ;
Battiston, Federico ;
de Arruda, Guilherme Ferraz ;
Moreno, Yamir ;
Perc, Matjaz ;
Latora, Vito .
NATURE HUMAN BEHAVIOUR, 2021, 5 (05) :586-595
[2]   Strategy equilibrium in dilemma games with off-diagonal payoff perturbations [J].
Amaral, Marco A. ;
Javarone, Marco A. .
PHYSICAL REVIEW E, 2020, 101 (06)
[3]   Heterogeneity in evolutionary games: an analysis of the risk perception [J].
Amaral, Marco A. ;
Javarone, Marco A. .
PROCEEDINGS OF THE ROYAL SOCIETY A-MATHEMATICAL PHYSICAL AND ENGINEERING SCIENCES, 2020, 476 (2237)
[4]   Stochastic win-stay-lose-shift strategy with dynamic aspirations in evolutionary social dilemmas [J].
Amaral, Marco A. ;
Wardil, Lucas ;
Perc, Matjaz ;
da Silva, Jafferson K. L. .
PHYSICAL REVIEW E, 2016, 94 (03)
[5]   Intention recognition promotes the emergence of cooperation [J].
Anh, Han The ;
Pereira, Luis Moniz ;
Santos, Francisco C. .
ADAPTIVE BEHAVIOR, 2011, 19 (04) :264-279
[6]  
[Anonymous], 2006, EVOLUTION COOPERATIO
[7]   Replicator population dynamics of group interactions: Broken symmetry, thresholds for metastability, and macroscopic behavior [J].
Artiges, Emmanuel ;
Gracia-Lazaro, Carlos ;
Mario Floria, Luis ;
Moreno, Yamir .
PHYSICAL REVIEW E, 2019, 100 (05)
[8]  
Binder K., 1988, MONTE CARLO SIMULATI
[9]  
Bush R. R., 1955, STOCHASTIC MODELS LE
[10]  
Busoniu L, 2010, STUD COMPUT INTELL, V310, P183