Transfer Learning in Multi-Armed Bandits: A Causal Approach

被引:0
|
作者
Zhang, Junzhe [1 ]
Bareinboim, Elias [1 ]
机构
[1] Purdue Univ, W Lafayette, IN 47907 USA
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Reinforcement learning (RL) agents have been deployed in complex environments where interactions are costly, and learning is usually slow. One prominent task in these settings is to reuse interactions performed by other agents to accelerate the learning process. Causal inference provides a family of methods to infer the effects of actions from a combination of data and qualitative assumptions about the underlying environment. Despite its success of transferring invariant knowledge across domains in the empirical sciences, causal inference has not been fully realized in the context of transfer learning in interactive domains. In this paper, we use causal inference as a basis to support a principled and more robust transfer of knowledge in RL settings. In particular, we tackle the problem of transferring knowledge across bandit agents in settings where causal effects cannot be identified by do-calculus [Pearl, 2000] and standard learning techniques. Our new identification strategy combines two steps - first, deriving bounds over the arms distribution based on structural knowledge; second, incorporating these bounds in a dynamic allocation procedure so as to guide the search towards more promising actions. We formally prove that our strategy dominates previously known algorithms and achieves orders of magnitude faster convergence rates than these algorithms. Finally, we perform simulations and empirically demonstrate that our strategy is consistently more efficient than the current (non-causal) state-of-the-art methods.
引用
收藏
页码:1340 / 1346
页数:7
相关论文
共 50 条
  • [31] Multi-Armed Bandits with Cost Subsidy
    Sinha, Deeksha
    Sankararama, Karthik Abinav
    Kazerouni, Abbas
    Avadhanula, Vashist
    24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
  • [32] Multi-Armed Bandits With Correlated Arms
    Gupta, Samarth
    Chaudhari, Shreyas
    Joshi, Gauri
    Yagan, Osman
    IEEE TRANSACTIONS ON INFORMATION THEORY, 2021, 67 (10) : 6711 - 6732
  • [33] Batched Multi-armed Bandits Problem
    Gao, Zijun
    Han, Yanjun
    Ren, Zhimei
    Zhou, Zhengqing
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [34] Multi-armed bandits: Theory and applications to online learning in networks
    Zhao Q.
    Zhao, Qing, 1600, Morgan and Claypool Publishers (12): : 1 - 165
  • [35] Human-AI Learning Performance in Multi-Armed Bandits
    Pandya, Ravi
    Huang, Sandy H.
    Hadfield-Menell, Dylan
    Dragan, Anca D.
    AIES '19: PROCEEDINGS OF THE 2019 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2019, : 369 - 375
  • [36] Are Multi-Armed Bandits Susceptible to Peeking?
    Loecher, Markus
    ZAGREB INTERNATIONAL REVIEW OF ECONOMICS & BUSINESS, 2018, 21 (01): : 95 - 104
  • [37] Secure Outsourcing of Multi-Armed Bandits
    Ciucanu, Radu
    Lafourcade, Pascal
    Lombard-Platet, Marius
    Soare, Marta
    2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 202 - 209
  • [38] Decentralized Exploration in Multi-Armed Bandits
    Feraud, Raphael
    Alami, Reda
    Laroche, Romain
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
  • [39] Learning Variable Ordering Heuristics with Multi-Armed Bandits and Restarts
    Wattez, Hugues
    Koriche, Frederic
    Lecoutre, Christophe
    Paparrizou, Anastasia
    Tabary, Sebastien
    ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 371 - 378
  • [40] Multi-armed bandits with episode context
    Rosin, Christopher D.
    ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2011, 61 (03) : 203 - 230