EXPLOITING SIMILARITY INFORMATION IN REINFORCEMENT LEARNING Similarity Models for Multi-Armed Bandits and MDPs

被引：0

作者：

Ortner, Ronald ^{[1
]}

机构：

[1] Univ Leoben, Lehrstuhl Informat Technol, Leoben, Austria

来源：

ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1: ARTIFICIAL INTELLIGENCE | 2010年

基金：

奥地利科学基金会;

关键词：

Reinforcement learning; Markov decision process; Multi-armed bandit; Similarity; Regret;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

This paper considers reinforcement learning problems with additional similarity information. We start with the simple setting of multi-armed bandits in which the learner knows for each arm its color, where it is assumed that arms of the same color have close mean rewards. An algorithm is presented that shows that this color information can be used to improve the dependency of online regret bounds on the number of arms. Further, we discuss to what extent this approach can be extended to the more general case of Markov decision processes. For the simplest case where the same color for actions means similar rewards and identical transition probabilities, an algorithm and a corresponding online regret bound are given. For the general case where transition probabilities of same-colored actions imply only close but not necessarily identical transition probabilities we give upper and lower bounds on the error by action aggregation with respect to the color information. These bounds also imply that the general case is far more difficult to handle.

引用

页码：203 / 210

页数：8

共 50 条

[31] 2nd Workshop on Multi-Armed Bandits and Reinforcement Learning: Advancing Decision Making in E-Commerce and Beyond
Wang, Chu
Wang, Yingfei
Luo, Haipeng
Jiang, Daniel
He, Jinghai
Zheng, Zeyu
PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5890 - 5890
[32] Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems
Koulouriotis, D. E.
Xanthopoulos, A.
APPLIED MATHEMATICS AND COMPUTATION, 2008, 196 (02) : 913 - 922
[33] Simple fixes that accommodate switching costs in multi-armed bandits
Teymourian, Ehsan
Yang, Jian
EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2025, 320 (03) : 616 - 627
[34] A central limit theorem, loss aversion and multi-armed bandits
Chen, Zengjing
Epstein, Larry G.
Zhang, Guodong
JOURNAL OF ECONOMIC THEORY, 2023, 209
[35] Multi-armed bandits for decentralized AP selection in enterprise WLANs
Carrascosa, Marc
Bellalta, Boris
COMPUTER COMMUNICATIONS, 2020, 159 : 108 - 123
[36] MULTI-ARMED BANDITS FOR HUMAN-MACHINE DECISION MAKING
Reverdy, Paul
Srivastava, Vaibhav
2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6986 - 6990
[37] Multi-armed bandits in the wild: Pitfalls and strategies in online experiments
Mottos, David Issa
Bosch, Jan
Olsson, Helena Holmstrom
INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 113 : 68 - 81
[38] Regression Oracles and Exploration Strategies for Short-Horizon Multi-Armed Bandits
Gray, Robert C.
Zhu, Jichen
Ontanon, Santiago
2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 312 - 319
[39] Multi-armed bandits for adjudicating documents in pooling-based evaluation of information retrieval systems
Losada, David E.
Parapar, Javier
Barreiro, Alvaro
INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (05) : 1005 - 1025
[40] Exploiting History Data for Nonstationary Multi-armed Bandit
Re, Gerlando
Chiusano, Fabio
Trovo, Francesco
Carrera, Diego
Boracchi, Giacomo
Restelli, Marcello
MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 51 - 66

← 1 2 3 4 5 →