Infinite Horizon Multi-armed Bandits with Reward Vectors: Exploration/Exploitation Trade-off

被引：0

作者：

Drugan, Madalina M. ^{[1
]}

机构：

[1] Vrije Univ Brussel, Artificial Intelligence Lab, Pleinlaan 2, B-1050 Brussels, Belgium

来源：

AGENTS AND ARTIFICIAL INTELLIGENCE, ICAART 2015 | 2015年 / 9494卷

关键词：

Multi-armed bandits; Multi-objective optimisation; Pareto dominance relation; Infinite horizon policies;

D O I：

10.1007/978-3-319-27947-3_7

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We focus on the effect of the exploration/exploitation tradeoff strategies on the algorithmic design off multi-armed bandits (MAB) with reward vectors. Pareto dominance relation assesses the quality of reward vectors in infinite horizon MABs, like the UCB1 and UCB2 algorithms. In single objective MABs, there is a trade-off between the exploration of the suboptimal arms, and exploitation of a single optimal arm. Pareto dominance based MABs fairly exploit all Pareto optimal arms, and explore suboptimal arms. We study the exploration vs exploitation trade-off for two UCB like algorithms for reward vectors. We analyse the properties of the proposed MAB algorithms in terms of upper regret bounds and we experimentally compare their exploration vs exploitation trade-off on a bi-objective Bernoulli environment coming from control theory.

引用

页码：128 / 144

页数：17

共 8 条

[1] Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits
Rouyer, Chloe
Seldin, Yevgeny
CONFERENCE ON LEARNING THEORY, VOL 125, 2020, 125
[2] Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
Audibert, Jean-Yves
Munos, Remi
Szepesvari, Csaba
THEORETICAL COMPUTER SCIENCE, 2009, 410 (19) : 1876 - 1902
[3] Adaptive Noise Exploration for Neural Contextual Multi-Armed Bandits
Wang, Chi
Shi, Lin
Luo, Junru
ALGORITHMS, 2025, 18 (02)
[4] Minimax Off-Policy Evaluation for Multi-Armed Bandits
Ma, Cong
Zhu, Banghua
Jiao, Jiantao
Wainwright, Martin J.
IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (08) : 5314 - 5339
[5] Fast Beam Alignment via Pure Exploration in Multi-Armed Bandits
Wei, Yi
Zhong, Zixin
Tan, Vincent Y. F.
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (05) : 3264 - 3279
[6] The Exploration-Exploitation Trade-off in Interactive Recommender Systems
Barraza-Urbina, Andrea
PROCEEDINGS OF THE ELEVENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'17), 2017, : 431 - 435
[7] Exploration with Limited Memory: Streaming Algorithms for Coin Tossing, Noisy Comparisons, and Multi-armed Bandits
Assadi, Sepehr
Wang, Chen
PROCEEDINGS OF THE 52ND ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '20), 2020, : 1237 - 1250
[8] Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization
Jamieson, Stewart
How, Jonathan P.
Girdhar, Yogesh
ARTIFICIAL INTELLIGENCE, 2024, 330

← 1 →