A multi-armed bandit algorithm speeds up the evolution of cooperation

被引：0

作者：

Gatti, Roberto Cazzolla ^{[1
,2
]}

机构：

[1] Konrad Lorenz Inst Evolut & Cognit Res, Klosterneuburg, Austria

[2] Tomsk State Univ, Biol Inst, Tomsk, Russia

来源：

ECOLOGICAL MODELLING | 2021年 / 439卷

关键词：

Evolution of cooperation; Multi-armed bandit algorithm; Epsilon-greedy model; Matryoshka model; TIT-FOR-TAT; KIN SELECTION; RECIPROCAL ALTRUISM; OPTIMIZATION; RECOGNITION; COMPETITION; BIODIVERSITY; NETWORKS; DEFENSE; RED;

D O I：

10.1016/j.ecolmodel.2020.109348

中图分类号：

Q14 [生态学（生物生态学）];

学科分类号：

071012 ; 0713 ;

摘要：

Most evolutionary biologists consider selfishness an intrinsic feature of our genes and as the best choice in social situations. During the last years, prolific research has been conducted on the mechanisms that can allow cooperation to emerge "in a world of defectors" to become an evolutionarily stable strategy. A big debate started with the proposal by W.D. Hamilton of "kin selection" in terms of cost sustained by the cooperators and benefits received by related conspecifics. After this, four other main rules for the evolution of cooperation have been suggested. However, one of the main problems of these five rules is the assumption that the payoffs obtained by either cooperating or defeating are quite well known by the parties before they interact and do not change during the time or after repeated encounters. This is not always the case in real life. Following each rule blindly, there is a risk for individuals to get stuck in an unfavorable situation. Axelrod (1984) highlighted that the main problem is how to obtain benefits from cooperation without passing through several trials and errors, which are slow and painful. With a better understanding of this process, individuals can use their foresight to speed up the evolution of cooperation. Here I show that a multi-armed bandit (MAB) model, a classic problem in decision sciences, is naturally employed by individuals to opt for the best choice most of the time, accelerating the evolution of the altruistic behavior and solving the abovementioned problems. A common MAB model that applies extremely well to the evolution of cooperation is the epsilon-greedy (epsilon-greedy) algorithm. This algorithm, after an initial period of exploration (which can be considered as biological history), greedily exploits the best option epsilon% of the time and explores other options the remaining percentage of times (1-epsilon%). Through the epsilon-greedy decision making algorithm, cooperation evolves as a multilevel process nested in the hierarchical levels that exist among the five rules for the evolution of cooperation. This reinforcement learning, a subtype of artificial intelligence, with trials and errors, provides a powerful tool to better understand and even probabilistically quantify the chances cooperation has to evolve in a specific situation.

引用

页数：10

共 50 条

[1] Improving throughput using multi-armed bandit algorithm for wireless LANs
Kuroda, Kaori
Kato, Hiroki
Kim, Song-Ju
Naruse, Makoto
Hasegawa, Mikio
IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2018, 9 (01): : 74 - 81
[2] Combinatorial Multi-Armed Bandit with General Reward Functions
Chen, Wei
Hu, Wei
Li, Fu
Li, Jian
Liu, Yu
Lu, Pinyan
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 29 (NIPS 2016), 2016, 29
[3] Interface Design Optimization as a Multi-Armed Bandit Problem
Lomas, J. Derek
Forlizzi, Jodi
Poonwala, Nikhil
Patel, Nirmal
Shodhan, Sharan
Patel, Kishan
Koedinger, Ken
Brunskill, Emma
34TH ANNUAL CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYSTEMS, CHI 2016, 2016, : 4142 - 4153
[4] Multi-armed Bandit Algorithms for Adaptive Learning: A Survey
Mui, John
Lin, Fuhua
Dewan, M. Ali Akber
ARTIFICIAL INTELLIGENCE IN EDUCATION (AIED 2021), PT II, 2021, 12749 : 273 - 278
[5] Optimal activation of halting multi-armed bandit models
Cowan, Wesley
Katehakis, Michael N. N.
Ross, Sheldon M. M.
NAVAL RESEARCH LOGISTICS, 2023, 70 (07) : 639 - 652
[6] Efficient wireless network selection by using multi-armed bandit algorithm for mobile terminals
Oshima, Koji
Onishi, Takuma
Kim, Song-Ju
Ma, Jing
Hasegawa, Mikio
IEICE NONLINEAR THEORY AND ITS APPLICATIONS, 2020, 11 (01): : 68 - 77
[7] Multi-armed bandit algorithm for sequential experiments of molecular properties with dynamic feature selection
Abedin, Md. Menhazul
Tabata, Koji
Matsumura, Yoshihiro
Komatsuzaki, Tamiki
JOURNAL OF CHEMICAL PHYSICS, 2024, 161 (01)
[8] Multi-objective Contextual Multi-armed Bandit With a Dominant Objective
Tekin, Cem
Turgay, Eralp
IEEE TRANSACTIONS ON SIGNAL PROCESSING, 2018, 66 (14) : 3799 - 3813
[9] Design of Multi-Armed Bandit-Based Routing for in-Network Caching
Tabei, Gen
Ito, Yusuke
Kimura, Tomotaka
Hirata, Kouji
IEEE ACCESS, 2023, 11 : 82584 - 82600
[10] Deterministic Sequencing of Exploration and Exploitation for Multi-Armed Bandit Problems
Vakili, Sattar
Liu, Keqin
Zhao, Qing
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2013, 7 (05) : 759 - 767

← 1 2 3 4 5 →