Transfer Learning in Multi-Armed Bandits: A Causal Approach

被引：0

作者：

Zhang, Junzhe ^{[1
]}

Bareinboim, Elias ^{[1
]}

机构：

[1] Purdue Univ, W Lafayette, IN 47907 USA

来源：

PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2017年

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Reinforcement learning (RL) agents have been deployed in complex environments where interactions are costly, and learning is usually slow. One prominent task in these settings is to reuse interactions performed by other agents to accelerate the learning process. Causal inference provides a family of methods to infer the effects of actions from a combination of data and qualitative assumptions about the underlying environment. Despite its success of transferring invariant knowledge across domains in the empirical sciences, causal inference has not been fully realized in the context of transfer learning in interactive domains. In this paper, we use causal inference as a basis to support a principled and more robust transfer of knowledge in RL settings. In particular, we tackle the problem of transferring knowledge across bandit agents in settings where causal effects cannot be identified by do-calculus [Pearl, 2000] and standard learning techniques. Our new identification strategy combines two steps - first, deriving bounds over the arms distribution based on structural knowledge; second, incorporating these bounds in a dynamic allocation procedure so as to guide the search towards more promising actions. We formally prove that our strategy dominates previously known algorithms and achieves orders of magnitude faster convergence rates than these algorithms. Finally, we perform simulations and empirically demonstrate that our strategy is consistently more efficient than the current (non-causal) state-of-the-art methods.

引用

页码：1340 / 1346

页数：7

共 50 条

[41] Introduction to Multi-Armed Bandits Preface
Slivkins, Aleksandrs
FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2019, 12 (1-2): : 1 - 286
[42] PAC-Bayesian lifelong learning for multi-armed bandits
Hamish Flynn
David Reeb
Melih Kandemir
Jan Peters
Data Mining and Knowledge Discovery, 2022, 36 : 841 - 876
[43] Optimal Learning Policies for Differential Privacy in Multi-armed Bandits
Wang, Siwei
Zhu, Jun
JOURNAL OF MACHINE LEARNING RESEARCH, 2024, 25
[44] Federated Multi-armed Bandits with Personalization
Shi, Chengshuai
Shen, Cong
Yang, Jing
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[45] Falcon: Fair Active Learning using Multi-armed Bandits
Tae, Ki Hyun
Zhang, Hantian
Park, Jaeyoung
Rong, Kexin
Whang, Steven Euijong
PROCEEDINGS OF THE VLDB ENDOWMENT, 2024, 17 (05): : 952 - 965
[46] Unreliable Multi-Armed Bandits: A Novel Approach to Recommendation Systems
Ravi, Aditya Narayan
Poduval, Pranav
Moharir, Sharayu
2020 INTERNATIONAL CONFERENCE ON COMMUNICATION SYSTEMS & NETWORKS (COMSNETS), 2020,
[47] A Differentially Private Approach for Budgeted Combinatorial Multi-Armed Bandits
Wang, Hengzhi
Cui, Laizhong
Wang, En
Liu, Jiangchuan
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2025, 22 (01) : 424 - 439
[48] Qualitative Multi-Armed Bandits: A Quantile-Based Approach
Szorenyi, Balazs
Busa-Fekete, Robert
Weng, Paul
Huellermeier, Eyke
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37, 2015, 37 : 1660 - 1668
[49] Online Learning Approach for Jammer Detection in UAV Swarms Using Multi-Armed Bandits
Khial, Noor
Ahmed, Nema
Tluli, Reem Bassam
Yaacoub, Elias
Mohamed, Amr
2023 International Symposium on Networks, Computers and Communications, ISNCC 2023, 2023,
[50] LEVY BANDITS: MULTI-ARMED BANDITS DRIVEN BY LEVY PROCESSES
Kaspi, Haya
Mandelbaum, Avi
ANNALS OF APPLIED PROBABILITY, 1995, 5 (02): : 541 - 565

← 1 2 3 4 5 →