Infinite Horizon Multi-armed Bandits with Reward Vectors: Exploration/Exploitation Trade-off

被引:0
|
作者
Drugan, Madalina M. [1 ]
机构
[1] Vrije Univ Brussel, Artificial Intelligence Lab, Pleinlaan 2, B-1050 Brussels, Belgium
来源
AGENTS AND ARTIFICIAL INTELLIGENCE, ICAART 2015 | 2015年 / 9494卷
关键词
Multi-armed bandits; Multi-objective optimisation; Pareto dominance relation; Infinite horizon policies;
D O I
10.1007/978-3-319-27947-3_7
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We focus on the effect of the exploration/exploitation tradeoff strategies on the algorithmic design off multi-armed bandits (MAB) with reward vectors. Pareto dominance relation assesses the quality of reward vectors in infinite horizon MABs, like the UCB1 and UCB2 algorithms. In single objective MABs, there is a trade-off between the exploration of the suboptimal arms, and exploitation of a single optimal arm. Pareto dominance based MABs fairly exploit all Pareto optimal arms, and explore suboptimal arms. We study the exploration vs exploitation trade-off for two UCB like algorithms for reward vectors. We analyse the properties of the proposed MAB algorithms in terms of upper regret bounds and we experimentally compare their exploration vs exploitation trade-off on a bi-objective Bernoulli environment coming from control theory.
引用
收藏
页码:128 / 144
页数:17
相关论文
共 8 条
  • [1] Tsallis-INF for Decoupled Exploration and Exploitation in Multi-armed Bandits
    Rouyer, Chloe
    Seldin, Yevgeny
    CONFERENCE ON LEARNING THEORY, VOL 125, 2020, 125
  • [2] Exploration-exploitation tradeoff using variance estimates in multi-armed bandits
    Audibert, Jean-Yves
    Munos, Remi
    Szepesvari, Csaba
    THEORETICAL COMPUTER SCIENCE, 2009, 410 (19) : 1876 - 1902
  • [3] Adaptive Noise Exploration for Neural Contextual Multi-Armed Bandits
    Wang, Chi
    Shi, Lin
    Luo, Junru
    ALGORITHMS, 2025, 18 (02)
  • [4] Minimax Off-Policy Evaluation for Multi-Armed Bandits
    Ma, Cong
    Zhu, Banghua
    Jiao, Jiantao
    Wainwright, Martin J.
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2022, 68 (08) : 5314 - 5339
  • [5] Fast Beam Alignment via Pure Exploration in Multi-Armed Bandits
    Wei, Yi
    Zhong, Zixin
    Tan, Vincent Y. F.
    IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2023, 22 (05) : 3264 - 3279
  • [6] The Exploration-Exploitation Trade-off in Interactive Recommender Systems
    Barraza-Urbina, Andrea
    PROCEEDINGS OF THE ELEVENTH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'17), 2017, : 431 - 435
  • [7] Exploration with Limited Memory: Streaming Algorithms for Coin Tossing, Noisy Comparisons, and Multi-armed Bandits
    Assadi, Sepehr
    Wang, Chen
    PROCEEDINGS OF THE 52ND ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '20), 2020, : 1237 - 1250
  • [8] Finding the optimal exploration-exploitation trade-off online through Bayesian risk estimation and minimization
    Jamieson, Stewart
    How, Jonathan P.
    Girdhar, Yogesh
    ARTIFICIAL INTELLIGENCE, 2024, 330