EXPLOITING SIMILARITY INFORMATION IN REINFORCEMENT LEARNING Similarity Models for Multi-Armed Bandits and MDPs

被引:0
作者
Ortner, Ronald [1 ]
机构
[1] Univ Leoben, Lehrstuhl Informat Technol, Leoben, Austria
来源
ICAART 2010: PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON AGENTS AND ARTIFICIAL INTELLIGENCE, VOL 1: ARTIFICIAL INTELLIGENCE | 2010年
基金
奥地利科学基金会;
关键词
Reinforcement learning; Markov decision process; Multi-armed bandit; Similarity; Regret;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper considers reinforcement learning problems with additional similarity information. We start with the simple setting of multi-armed bandits in which the learner knows for each arm its color, where it is assumed that arms of the same color have close mean rewards. An algorithm is presented that shows that this color information can be used to improve the dependency of online regret bounds on the number of arms. Further, we discuss to what extent this approach can be extended to the more general case of Markov decision processes. For the simplest case where the same color for actions means similar rewards and identical transition probabilities, an algorithm and a corresponding online regret bound are given. For the general case where transition probabilities of same-colored actions imply only close but not necessarily identical transition probabilities we give upper and lower bounds on the error by action aggregation with respect to the color information. These bounds also imply that the general case is far more difficult to handle.
引用
收藏
页码:203 / 210
页数:8
相关论文
共 50 条
  • [31] 2nd Workshop on Multi-Armed Bandits and Reinforcement Learning: Advancing Decision Making in E-Commerce and Beyond
    Wang, Chu
    Wang, Yingfei
    Luo, Haipeng
    Jiang, Daniel
    He, Jinghai
    Zheng, Zeyu
    PROCEEDINGS OF THE 29TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2023, 2023, : 5890 - 5890
  • [32] Reinforcement learning and evolutionary algorithms for non-stationary multi-armed bandit problems
    Koulouriotis, D. E.
    Xanthopoulos, A.
    APPLIED MATHEMATICS AND COMPUTATION, 2008, 196 (02) : 913 - 922
  • [33] Simple fixes that accommodate switching costs in multi-armed bandits
    Teymourian, Ehsan
    Yang, Jian
    EUROPEAN JOURNAL OF OPERATIONAL RESEARCH, 2025, 320 (03) : 616 - 627
  • [34] A central limit theorem, loss aversion and multi-armed bandits
    Chen, Zengjing
    Epstein, Larry G.
    Zhang, Guodong
    JOURNAL OF ECONOMIC THEORY, 2023, 209
  • [35] Multi-armed bandits for decentralized AP selection in enterprise WLANs
    Carrascosa, Marc
    Bellalta, Boris
    COMPUTER COMMUNICATIONS, 2020, 159 : 108 - 123
  • [36] MULTI-ARMED BANDITS FOR HUMAN-MACHINE DECISION MAKING
    Reverdy, Paul
    Srivastava, Vaibhav
    2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), 2018, : 6986 - 6990
  • [37] Multi-armed bandits in the wild: Pitfalls and strategies in online experiments
    Mottos, David Issa
    Bosch, Jan
    Olsson, Helena Holmstrom
    INFORMATION AND SOFTWARE TECHNOLOGY, 2019, 113 : 68 - 81
  • [38] Regression Oracles and Exploration Strategies for Short-Horizon Multi-Armed Bandits
    Gray, Robert C.
    Zhu, Jichen
    Ontanon, Santiago
    2020 IEEE CONFERENCE ON GAMES (IEEE COG 2020), 2020, : 312 - 319
  • [39] Multi-armed bandits for adjudicating documents in pooling-based evaluation of information retrieval systems
    Losada, David E.
    Parapar, Javier
    Barreiro, Alvaro
    INFORMATION PROCESSING & MANAGEMENT, 2017, 53 (05) : 1005 - 1025
  • [40] Exploiting History Data for Nonstationary Multi-armed Bandit
    Re, Gerlando
    Chiusano, Fabio
    Trovo, Francesco
    Carrera, Diego
    Boracchi, Giacomo
    Restelli, Marcello
    MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, 2021, 12975 : 51 - 66