EXPLOITING SIMILARITY INFORMATION IN REINFORCEMENT LEARNING Similarity Models for Multi-Armed Bandits and MDPs

被引:0
作者
Ortner, Ronald [1 ]
机构
[1] Univ Leoben, Lehrstuhl Informat Technol, Leoben, Austria
来源
ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1: ARTIFICIAL INTELLIGENCE | 2010年
基金
奥地利科学基金会;
关键词
Reinforcement learning; Markov decision process; Multi-armed bandit; Similarity; Regret;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper considers reinforcement learning problems with additional similarity information. We start with the simple setting of multi-armed bandits in which the learner knows for each arm its color, where it is assumed that arms of the same color have close mean rewards. An algorithm is presented that shows that this color information can be used to improve the dependency of online regret bounds on the number of arms. Further, we discuss to what extent this approach can be extended to the more general case of Markov decision processes. For the simplest case where the same color for actions means similar rewards and identical transition probabilities, an algorithm and a corresponding online regret bound are given. For the general case where transition probabilities of same-colored actions imply only close but not necessarily identical transition probabilities we give upper and lower bounds on the error by action aggregation with respect to the color information. These bounds also imply that the general case is far more difficult to handle.
引用
收藏
页码:203 / 210
页数:8
相关论文
共 50 条
  • [41] MAB-OS: Multi-Armed Bandits Metaheuristic Optimizer Selection
    Meidani, Kazem
    Mirjalili, Seyedali
    Farimani, Amir Barati
    APPLIED SOFT COMPUTING, 2022, 128
  • [42] Potential and pitfalls of Multi-Armed Bandits for decentralized Spatial Reuse in WLANs
    Wilhelmi, Francesc
    Barrachina-Munoz, Sergio
    Bellalta, Boris
    Cano, Cristina
    Jonsson, Anders
    Neu, Gergely
    JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2019, 127 : 26 - 42
  • [43] Secure protocols for cumulative reward maximization in stochastic multi-armed bandits
    Ciucanu, Radu
    Lafourcade, Pascal
    Lombard-Platet, Marius
    Soare, Marta
    JOURNAL OF COMPUTER SECURITY, 2023, 31 (01) : 1 - 27
  • [44] SIC - MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits
    Boursier, Etienne
    Perchet, Vianney
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [45] Multi-Armed Bandits with Fairness Constraints for Distributing Resources to Human Teammates
    Claure, Houston
    Chen, Yifang
    Modi, Jignesh
    Jung, Malte
    Nikolaidis, Stefanos
    PROCEEDINGS OF THE 2020 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '20), 2020, : 299 - 308
  • [46] Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits
    Park, Hongju
    Faradonbeh, Mohamad Kazem Shirani
    IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2150 - 2155
  • [47] Learning Early Exit for Deep Neural Network Inference on Mobile Devices through Multi-Armed Bandits
    Ju, Weiyu
    Bao, Wei
    Yuan, Dong
    Ge, Liming
    Zhou, Bing Bing
    21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 11 - 20
  • [48] Decentralized AP selection using Multi-Armed Bandits: Opportunistic ε-Greedy with Stickiness
    Carrascosa, Marc
    Bellalta, Boris
    2019 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2019, : 309 - 315
  • [49] Multi-Armed Bandits for Minesweeper: Profiting From Exploration-Exploitation Synergy
    Lordeiro, Igor Q.
    Haddad, Diego B.
    Cardoso, Douglas O.
    IEEE TRANSACTIONS ON GAMES, 2022, 14 (03) : 403 - 412
  • [50] Risk-Aware Multi-Armed Bandits With Refined Upper Confidence Bounds
    Liu, Xingchi
    Derakhshani, Mahsa
    Lambotharan, Sangarapillai
    van der Schaar, Mihaela
    IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 269 - 273