Transfer Learning in Multi-Armed Bandits: A Causal Approach

被引:0
|
作者
Zhang, Junzhe [1 ]
Bareinboim, Elias [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) agents have been deployed in complex environments where interactions are costly, and learning is usually slow. One prominent task in these settings is to reuse interactions performed by other agents to accelerate the learning process. Causal inference provides a family of methods to infer the effects of actions from a combination of data and qualitative assumptions about the underlying environment. Despite its success of transferring invariant knowledge across domains in the empirical sciences, causal inference has not been fully realized in the context of transfer learning in interactive domains. In this paper, we use causal inference as a basis to support a principled and more robust transfer of knowledge in RL settings. In particular, we tackle the problem of transferring knowledge across bandit agents in settings where causal effects cannot be identified by do-calculus [Pearl, 2000] and standard learning techniques. Our new identification strategy combines two steps - first, deriving bounds over the arms distribution based on structural knowledge; second, incorporating these bounds in a dynamic allocation procedure so as to guide the search towards more promising actions. We formally prove that our strategy dominates previously known algorithms and achieves orders of magnitude faster convergence rates than these algorithms. Finally, we perform simulations and empirically demonstrate that our strategy is consistently more efficient than the current (non-causal) state-of-the-art methods.
引用
收藏
页码:1340 / 1346
页数:7
相关论文
共 50 条
  • [41] Introduction to Multi-Armed Bandits Preface
    Slivkins, Aleksandrs
    FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2019, 12 (1-2): : 1 - 286
  • [42] PAC-Bayesian lifelong learning for multi-armed bandits
    Hamish Flynn
    David Reeb
    Melih Kandemir
    Jan Peters
    Data Mining and Knowledge Discovery, 2022, 36 : 841 - 876
  • [43] Optimal Learning Policies for Differential Privacy in Multi-armed Bandits
    Wang, Siwei
    Zhu, Jun
    JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
  • [44] Federated Multi-armed Bandits with Personalization
    Shi, Chengshuai
    Shen, Cong
    Yang, Jing
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [45] Falcon: Fair Active Learning using Multi-armed Bandits
    Tae, Ki Hyun
    Zhang, Hantian
    Park, Jaeyoung
    Rong, Kexin
    Whang, Steven Euijong
    PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (05): : 952 - 965
  • [46] Unreliable Multi-Armed Bandits: A Novel Approach to Recommendation Systems
    Ravi, Aditya Narayan
    Poduval, Pranav
    Moharir, Sharayu
    2020 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), 2020,
  • [47] A Differentially Private Approach for Budgeted Combinatorial Multi-Armed Bandits
    Wang, Hengzhi
    Cui, Laizhong
    Wang, En
    Liu, Jiangchuan
    IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2025, 22 (01) : 424 - 439
  • [48] Qualitative Multi-Armed Bandits: A Quantile-Based Approach
    Szorenyi, Balazs
    Busa-Fekete, Robert
    Weng, Paul
    Huellermeier, Eyke
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1660 - 1668
  • [49] Online Learning Approach for Jammer Detection in UAV Swarms Using Multi-Armed Bandits
    Khial, Noor
    Ahmed, Nema
    Tluli, Reem Bassam
    Yaacoub, Elias
    Mohamed, Amr
    2023 International Symposium on Networks, Computers and Communications, ISNCC 2023, 2023,
  • [50] LEVY BANDITS: MULTI-ARMED BANDITS DRIVEN BY LEVY PROCESSES
    Kaspi, Haya
    Mandelbaum, Avi
    ANNALS OF APPLIED PROBABILITY, 1995, 5 (02): : 541 - 565