Cooperative Learning for Adversarial Multi-Armed Bandit on Open Multi-Agent Systems

被引：1

作者：

Nakamura, Tomoki ^{[1
]}

Hayashi, Naoki ^{[1
]}

Inuiguchi, Masahiro ^{[1
]}

机构：

[1] Osaka Univ, Grad Sch Engn Sci, Toyonaka, Osaka 5608531, Japan

来源：

IEEE CONTROL SYSTEMS LETTERS | 2023年 / 7卷

关键词：

Estimation; Multi-agent systems; Decision making; Upper bound; Peer-to-peer computing; Optimization; Heuristic algorithms; Cooperative learning; adversarial multiarmed bandit; open multi-agent system;

D O I：

10.1109/LCSYS.2023.3279788

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This letter considers a cooperative decision-making method for an adversarial bandit problem on open multi-agent systems. In an open multi-agent system, the network configuration changes dynamically as agents freely enter and leave the network. We propose a distributed Exp3 policy in which a group of agents exchanges the estimation of the expected reward of each arm with active neighboring agents. Then, each agent updates the probability distribution of choosing arms by combining the estimated rewards of neighboring agents. We derive a sufficient condition for a sublinear bound of a pseudo regret. The numerical example shows that active agents can cooperatively find the optimal arm by the proposed Exp3 policy algorithm.

引用

页码：1712 / 1717

页数：6

共 50 条

[1] Multi-agent Multi-armed Bandit Learning for Content Caching in Edge Networks
Su, Lina
Zhou, Ruiting
Wang, Ne
Chen, Junmei
Li, Zongpeng
2022 IEEE INTERNATIONAL CONFERENCE ON WEB SERVICES (IEEE ICWS 2022), 2022, : 11 - 16
[2] Decentralized Multi-Agent Multi-Armed Bandit Learning With Calibration for Multi-Cell Caching
Xu, Xianzhe
Tao, Meixia
IEEE TRANSACTIONS ON COMMUNICATIONS, 2021, 69 (04) : 2457 - 2472
[3] Collaborative Multi-Agent Multi-Armed Bandit Learning for Small-Cell Caching
Xu, Xianzhe
Tao, Meixia
Shen, Cong
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2020, 19 (04) : 2570 - 2585
[4] A Dynamic Observation Strategy for Multi-agent Multi-armed Bandit Problem
Madhushani, Udari
Leonard, Naomi Ehrich
2020 EUROPEAN CONTROL CONFERENCE (ECC 2020), 2020, : 1677 - 1682
[5] Multi-Agent Multi-Armed Bandit Learning for Online Management of Edge-Assisted Computing
Wu, Bochun
Chen, Tianyi
Ni, Wei
Wang, Xin
IEEE TRANSACTIONS ON COMMUNICATIONS, 2021, 69 (12) : 8188 - 8199
[6] Achieving Privacy in the Adversarial Multi-Armed Bandit
Tossou, Aristide C. Y.
Dimitrakakis, Christos
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, : 2653 - 2659
[7] Bridging Adversarial and Nonstationary Multi-Armed Bandit
Chen, Ningyuan
Yang, Shuoguang
Zhang, Hailun
PRODUCTION AND OPERATIONS MANAGEMENT, 2025,
[8] An Efficient Algorithm for Fair Multi-Agent Multi-Armed Bandit with Low Regret
Jones, Matthew
Huy Nguyen
Thy Nguyen
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 7, 2023, : 8159 - 8167
[9] Decentralized Randomly Distributed Multi-agent Multi-armed Bandit with Heterogeneous Rewards
Xu, Mengfan
Klabjan, Diego
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 36 (NEURIPS 2023), 2023,
[10] Sustainable Cooperative Coevolution with a Multi-Armed Bandit
De Rainville, Francois-Michel
Sebag, Michele
Gagne, Christian
Schoenauer, Marc
Laurendeau, Denis
GECCO'13: PROCEEDINGS OF THE 2013 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2013, : 1517 - 1524

← 1 2 3 4 5 →