A multi-armed bandit algorithm speeds up the evolution of cooperation

被引：0

作者：

Gatti, Roberto Cazzolla ^{[1
,2
]}

机构：

[1] Konrad Lorenz Inst Evolut & Cognit Res, Klosterneuburg, Austria

[2] Tomsk State Univ, Biol Inst, Tomsk, Russia

来源：

ECOLOGICAL MODELLING | 2021年 / 439卷

关键词：

Evolution of cooperation; Multi-armed bandit algorithm; Epsilon-greedy model; Matryoshka model; TIT-FOR-TAT; KIN SELECTION; RECIPROCAL ALTRUISM; OPTIMIZATION; RECOGNITION; COMPETITION; BIODIVERSITY; NETWORKS; DEFENSE; RED;

D O I：

10.1016/j.ecolmodel.2020.109348

中图分类号：

Q14 [生态学（生物生态学）];

学科分类号：

071012 ; 0713 ;

摘要：

Most evolutionary biologists consider selfishness an intrinsic feature of our genes and as the best choice in social situations. During the last years, prolific research has been conducted on the mechanisms that can allow cooperation to emerge "in a world of defectors" to become an evolutionarily stable strategy. A big debate started with the proposal by W.D. Hamilton of "kin selection" in terms of cost sustained by the cooperators and benefits received by related conspecifics. After this, four other main rules for the evolution of cooperation have been suggested. However, one of the main problems of these five rules is the assumption that the payoffs obtained by either cooperating or defeating are quite well known by the parties before they interact and do not change during the time or after repeated encounters. This is not always the case in real life. Following each rule blindly, there is a risk for individuals to get stuck in an unfavorable situation. Axelrod (1984) highlighted that the main problem is how to obtain benefits from cooperation without passing through several trials and errors, which are slow and painful. With a better understanding of this process, individuals can use their foresight to speed up the evolution of cooperation. Here I show that a multi-armed bandit (MAB) model, a classic problem in decision sciences, is naturally employed by individuals to opt for the best choice most of the time, accelerating the evolution of the altruistic behavior and solving the abovementioned problems. A common MAB model that applies extremely well to the evolution of cooperation is the epsilon-greedy (epsilon-greedy) algorithm. This algorithm, after an initial period of exploration (which can be considered as biological history), greedily exploits the best option epsilon% of the time and explores other options the remaining percentage of times (1-epsilon%). Through the epsilon-greedy decision making algorithm, cooperation evolves as a multilevel process nested in the hierarchical levels that exist among the five rules for the evolution of cooperation. This reinforcement learning, a subtype of artificial intelligence, with trials and errors, provides a powerful tool to better understand and even probabilistically quantify the chances cooperation has to evolve in a specific situation.

引用

页数：10

共 50 条

[21] A Multi-Armed Bandit Algorithm for IRS-Aided VLC System Design With Device-to-Device Relays
Curry, Elam A.
Borah, Deva K.
IEEE ACCESS, 2024, 12 : 15764 - 15777
[22] Learning With Guarantee Via Constrained Multi-Armed Bandit: Theory and Network Applications
Cai, Kechao
Liu, Xutong
Chen, Yu-Zhen Janice
Lui, John C. S.
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2023, 22 (09) : 5346 - 5358
[23] Risk-aware multi-armed bandit problem with application to portfolio selection
Huo, Xiaoguang
Fu, Feng
ROYAL SOCIETY OPEN SCIENCE, 2017, 4 (11):
[24] Combinatorial Multi-Armed Bandit Based Unknown Worker Recruitment in Heterogeneous Crowdsensing
Gao, Guoju
Wu, Jie
Xiao, Mingjun
Chen, Guoliang
IEEE INFOCOM 2020 - IEEE CONFERENCE ON COMPUTER COMMUNICATIONS, 2020, : 179 - 188
[25] A Study of Anode-Supported Solid Oxide Fuel Cell Modeling and Optimization Using Neural Network and Multi-Armed Bandit Algorithm
Song, Changhee
Lee, Sanghoon
Gu, Bonhyun
Chang, Ikwhang
Cho, Gu Young
Baek, Jong Dae
Cha, Suk Won
ENERGIES, 2020, 13 (07)
[26] Dynamic channel selection in wireless communications via a multi-armed bandit algorithm using laser chaos time series
Takeuchi, Shungo
Hasegawa, Mikio
Kanno, Kazutaka
Uchida, Atsushi
Chauvet, Nicolas
Naruse, Makoto
SCIENTIFIC REPORTS, 2020, 10 (01)
[27] Federated Multi-Armed Bandit Learning for Caching in UAV-aided Content Dissemination
Bhuyan, Amit Kumar
Dutta, Hrishikesh
Biswas, Subir
AD HOC NETWORKS, 2023, 151
[28] Intelligent and Reconfigurable Architecture for KL Divergence-Based Multi-Armed Bandit Algorithms
Santosh, S. V. Sai
Darak, Sumit J.
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2021, 68 (03) : 1008 - 1012
[29] Dark-Pool Smart Order Routing: a Combinatorial Multi-armed Bandit Approach
Bernasconi, Martino
Martino, Stefano
Vittori, Edoardo
Trovo, Francesco
Restelli, Marcello
3RD ACM INTERNATIONAL CONFERENCE ON AI IN FINANCE, ICAIF 2022, 2022, : 352 - 360
[30] Monte Carlo Elites: Quality-Diversity Selection as a Multi-Armed Bandit Problem
Sfikas, Konstantinos
Liapis, Antonios
Yannakakis, Georgios N.
PROCEEDINGS OF THE 2021 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE (GECCO'21), 2021, : 180 - 188

← 1 2 3 4 5 →