A multi-armed bandit algorithm speeds up the evolution of cooperation

被引:0
作者
Gatti, Roberto Cazzolla [1 ,2 ]
机构
[1] Konrad Lorenz Inst Evolut & Cognit Res, Klosterneuburg, Austria
[2] Tomsk State Univ, Biol Inst, Tomsk, Russia
关键词
Evolution of cooperation; Multi-armed bandit algorithm; Epsilon-greedy model; Matryoshka model; TIT-FOR-TAT; KIN SELECTION; RECIPROCAL ALTRUISM; OPTIMIZATION; RECOGNITION; COMPETITION; BIODIVERSITY; NETWORKS; DEFENSE; RED;
D O I
10.1016/j.ecolmodel.2020.109348
中图分类号
Q14 [生态学(生物生态学)];
学科分类号
071012 ; 0713 ;
摘要
Most evolutionary biologists consider selfishness an intrinsic feature of our genes and as the best choice in social situations. During the last years, prolific research has been conducted on the mechanisms that can allow cooperation to emerge "in a world of defectors" to become an evolutionarily stable strategy. A big debate started with the proposal by W.D. Hamilton of "kin selection" in terms of cost sustained by the cooperators and benefits received by related conspecifics. After this, four other main rules for the evolution of cooperation have been suggested. However, one of the main problems of these five rules is the assumption that the payoffs obtained by either cooperating or defeating are quite well known by the parties before they interact and do not change during the time or after repeated encounters. This is not always the case in real life. Following each rule blindly, there is a risk for individuals to get stuck in an unfavorable situation. Axelrod (1984) highlighted that the main problem is how to obtain benefits from cooperation without passing through several trials and errors, which are slow and painful. With a better understanding of this process, individuals can use their foresight to speed up the evolution of cooperation. Here I show that a multi-armed bandit (MAB) model, a classic problem in decision sciences, is naturally employed by individuals to opt for the best choice most of the time, accelerating the evolution of the altruistic behavior and solving the abovementioned problems. A common MAB model that applies extremely well to the evolution of cooperation is the epsilon-greedy (epsilon-greedy) algorithm. This algorithm, after an initial period of exploration (which can be considered as biological history), greedily exploits the best option epsilon% of the time and explores other options the remaining percentage of times (1-epsilon%). Through the epsilon-greedy decision making algorithm, cooperation evolves as a multilevel process nested in the hierarchical levels that exist among the five rules for the evolution of cooperation. This reinforcement learning, a subtype of artificial intelligence, with trials and errors, provides a powerful tool to better understand and even probabilistically quantify the chances cooperation has to evolve in a specific situation.
引用
收藏
页数:10
相关论文
共 50 条
  • [21] A Multi-Armed Bandit Algorithm for IRS-Aided VLC System Design With Device-to-Device Relays
    Curry, Elam A.
    Borah, Deva K.
    IEEE ACCESS, 2024, 12 : 15764 - 15777
  • [22] Learning With Guarantee Via Constrained Multi-Armed Bandit: Theory and Network Applications
    Cai, Kechao
    Liu, Xutong
    Chen, Yu-Zhen Janice
    Lui, John C. S.
    IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (09) : 5346 - 5358
  • [23] Risk-aware multi-armed bandit problem with application to portfolio selection
    Huo, Xiaoguang
    Fu, Feng
    ROYAL SOCIETY OPEN SCIENCE, 2017, 4 (11):
  • [24] Combinatorial Multi-Armed Bandit Based Unknown Worker Recruitment in Heterogeneous Crowdsensing
    Gao, Guoju
    Wu, Jie
    Xiao, Mingjun
    Chen, Guoliang
    IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2020, : 179 - 188
  • [25] A Study of Anode-Supported Solid Oxide Fuel Cell Modeling and Optimization Using Neural Network and Multi-Armed Bandit Algorithm
    Song, Changhee
    Lee, Sanghoon
    Gu, Bonhyun
    Chang, Ikwhang
    Cho, Gu Young
    Baek, Jong Dae
    Cha, Suk Won
    ENERGIES, 2020, 13 (07)
  • [26] Dynamic channel selection in wireless communications via a multi-armed bandit algorithm using laser chaos time series
    Takeuchi, Shungo
    Hasegawa, Mikio
    Kanno, Kazutaka
    Uchida, Atsushi
    Chauvet, Nicolas
    Naruse, Makoto
    SCIENTIFIC REPORTS, 2020, 10 (01)
  • [27] Federated Multi-Armed Bandit Learning for Caching in UAV-aided Content Dissemination
    Bhuyan, Amit Kumar
    Dutta, Hrishikesh
    Biswas, Subir
    AD HOC NETWORKS, 2023, 151
  • [28] Intelligent and Reconfigurable Architecture for KL Divergence-Based Multi-Armed Bandit Algorithms
    Santosh, S. V. Sai
    Darak, Sumit J.
    IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (03) : 1008 - 1012
  • [29] Dark-Pool Smart Order Routing: a Combinatorial Multi-armed Bandit Approach
    Bernasconi, Martino
    Martino, Stefano
    Vittori, Edoardo
    Trovo, Francesco
    Restelli, Marcello
    3RD ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2022, 2022, : 352 - 360
  • [30] Monte Carlo Elites: Quality-Diversity Selection as a Multi-Armed Bandit Problem
    Sfikas, Konstantinos
    Liapis, Antonios
    Yannakakis, Georgios N.
    PROCEEDINGS OF THE 2021 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'21), 2021, : 180 - 188