Transfer Learning in Multi-Armed Bandits: A Causal Approach

被引：0

作者：

Zhang, Junzhe ^{[1
]}

Bareinboim, Elias ^{[1
]}

机构：

[1] Purdue Univ, W Lafayette, IN 47907 USA

来源：

PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2017年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning (RL) agents have been deployed in complex environments where interactions are costly, and learning is usually slow. One prominent task in these settings is to reuse interactions performed by other agents to accelerate the learning process. Causal inference provides a family of methods to infer the effects of actions from a combination of data and qualitative assumptions about the underlying environment. Despite its success of transferring invariant knowledge across domains in the empirical sciences, causal inference has not been fully realized in the context of transfer learning in interactive domains. In this paper, we use causal inference as a basis to support a principled and more robust transfer of knowledge in RL settings. In particular, we tackle the problem of transferring knowledge across bandit agents in settings where causal effects cannot be identified by do-calculus [Pearl, 2000] and standard learning techniques. Our new identification strategy combines two steps - first, deriving bounds over the arms distribution based on structural knowledge; second, incorporating these bounds in a dynamic allocation procedure so as to guide the search towards more promising actions. We formally prove that our strategy dominates previously known algorithms and achieves orders of magnitude faster convergence rates than these algorithms. Finally, we perform simulations and empirically demonstrate that our strategy is consistently more efficient than the current (non-causal) state-of-the-art methods.

引用

页码：1340 / 1346

页数：7

共 50 条

[31] Multi-Armed Bandits with Cost Subsidy
Sinha, Deeksha
Sankararama, Karthik Abinav
Kazerouni, Abbas
Avadhanula, Vashist
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[32] Multi-Armed Bandits With Correlated Arms
Gupta, Samarth
Chaudhari, Shreyas
Joshi, Gauri
Yagan, Osman
IEEE TRANSACTIONS ON INFORMATION THEORY, 2021, 67 (10) : 6711 - 6732
[33] Batched Multi-armed Bandits Problem
Gao, Zijun
Han, Yanjun
Ren, Zhimei
Zhou, Zhengqing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[34] Multi-armed bandits: Theory and applications to online learning in networks
Zhao Q.
Zhao, Qing, 1600, Morgan and Claypool Publishers (12): : 1 - 165
[35] Human-AI Learning Performance in Multi-Armed Bandits
Pandya, Ravi
Huang, Sandy H.
Hadfield-Menell, Dylan
Dragan, Anca D.
AIES '19: PROCEEDINGS OF THE 2019 AAAI/ACM CONFERENCE ON AI, ETHICS, AND SOCIETY, 2019, : 369 - 375
[36] Are Multi-Armed Bandits Susceptible to Peeking?
Loecher, Markus
ZAGREB INTERNATIONAL REVIEW OF ECONOMICS & BUSINESS, 2018, 21 (01): : 95 - 104
[37] Secure Outsourcing of Multi-Armed Bandits
Ciucanu, Radu
Lafourcade, Pascal
Lombard-Platet, Marius
Soare, Marta
2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 202 - 209
[38] Decentralized Exploration in Multi-Armed Bandits
Feraud, Raphael
Alami, Reda
Laroche, Romain
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 97, 2019, 97
[39] Learning Variable Ordering Heuristics with Multi-Armed Bandits and Restarts
Wattez, Hugues
Koriche, Frederic
Lecoutre, Christophe
Paparrizou, Anastasia
Tabary, Sebastien
ECAI 2020: 24TH EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2020, 325 : 371 - 378
[40] Multi-armed bandits with episode context
Rosin, Christopher D.
ANNALS OF MATHEMATICS AND ARTIFICIAL INTELLIGENCE, 2011, 61 (03) : 203 - 230

← 1 2 3 4 5 →