EXPLOITING SIMILARITY INFORMATION IN REINFORCEMENT LEARNING Similarity Models for Multi-Armed Bandits and MDPs

被引：0

作者：

Ortner, Ronald ^{[1
]}

机构：

[1] Univ Leoben, Lehrstuhl Informat Technol, Leoben, Austria

来源：

ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1: ARTIFICIAL INTELLIGENCE | 2010年

基金：

奥地利科学基金会;

关键词：

Reinforcement learning; Markov decision process; Multi-armed bandit; Similarity; Regret;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper considers reinforcement learning problems with additional similarity information. We start with the simple setting of multi-armed bandits in which the learner knows for each arm its color, where it is assumed that arms of the same color have close mean rewards. An algorithm is presented that shows that this color information can be used to improve the dependency of online regret bounds on the number of arms. Further, we discuss to what extent this approach can be extended to the more general case of Markov decision processes. For the simplest case where the same color for actions means similar rewards and identical transition probabilities, an algorithm and a corresponding online regret bound are given. For the general case where transition probabilities of same-colored actions imply only close but not necessarily identical transition probabilities we give upper and lower bounds on the error by action aggregation with respect to the color information. These bounds also imply that the general case is far more difficult to handle.

引用

页码：203 / 210

页数：8

共 50 条

[41] MAB-OS: Multi-Armed Bandits Metaheuristic Optimizer Selection
Meidani, Kazem
Mirjalili, Seyedali
Farimani, Amir Barati
APPLIED SOFT COMPUTING, 2022, 128
[42] Potential and pitfalls of Multi-Armed Bandits for decentralized Spatial Reuse in WLANs
Wilhelmi, Francesc
Barrachina-Munoz, Sergio
Bellalta, Boris
Cano, Cristina
Jonsson, Anders
Neu, Gergely
JOURNAL OF NETWORK AND COMPUTER APPLICATIONS, 2019, 127 : 26 - 42
[43] Secure protocols for cumulative reward maximization in stochastic multi-armed bandits
Ciucanu, Radu
Lafourcade, Pascal
Lombard-Platet, Marius
Soare, Marta
JOURNAL OF COMPUTER SECURITY, 2023, 31 (01) : 1 - 27
[44] SIC - MMAB: Synchronisation Involves Communication in Multiplayer Multi-Armed Bandits
Boursier, Etienne
Perchet, Vianney
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[45] Multi-Armed Bandits with Fairness Constraints for Distributing Resources to Human Teammates
Claure, Houston
Chen, Yifang
Modi, Jignesh
Jung, Malte
Nikolaidis, Stefanos
PROCEEDINGS OF THE 2020 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '20), 2020, : 299 - 308
[46] Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits
Park, Hongju
Faradonbeh, Mohamad Kazem Shirani
IEEE CONTROL SYSTEMS LETTERS, 2022, 6 : 2150 - 2155
[47] Learning Early Exit for Deep Neural Network Inference on Mobile Devices through Multi-Armed Bandits
Ju, Weiyu
Bao, Wei
Yuan, Dong
Ge, Liming
Zhou, Bing Bing
21ST IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND INTERNET COMPUTING (CCGRID 2021), 2021, : 11 - 20
[48] Decentralized AP selection using Multi-Armed Bandits: Opportunistic ε-Greedy with Stickiness
Carrascosa, Marc
Bellalta, Boris
2019 IEEE SYMPOSIUM ON COMPUTERS AND COMMUNICATIONS (ISCC), 2019, : 309 - 315
[49] Multi-Armed Bandits for Minesweeper: Profiting From Exploration-Exploitation Synergy
Lordeiro, Igor Q.
Haddad, Diego B.
Cardoso, Douglas O.
IEEE TRANSACTIONS ON GAMES, 2022, 14 (03) : 403 - 412
[50] Risk-Aware Multi-Armed Bandits With Refined Upper Confidence Bounds
Liu, Xingchi
Derakhshani, Mahsa
Lambotharan, Sangarapillai
van der Schaar, Mihaela
IEEE SIGNAL PROCESSING LETTERS, 2021, 28 : 269 - 273

← 1 2 3 4 5 →