Qualitative Multi-Armed Bandits: A Quantile-Based Approach

被引：0

作者：

Szorenyi, Balazs ^{[1
,5
,6
]}

Busa-Fekete, Robert ^{[2
]}

Weng, Paul ^{[3
,4
]}

Huellermeier, Eyke ^{[2
]}

机构：

[1] INRIA Lille Nord Europe, SequeL Project, 40 Ave Halley, F-59650 Villeneuve Dascq, France

[2] Univ Paderborn, Dept Comp Sci, D-33098 Paderborn, Germany

[3] SYSU CMU Joint Inst Engn, Guangzhou 510006, Guangdong, Peoples R China

[4] SYSU CMU Shunde Int Joint Res Inst, Shunde 528300, Peoples R China

[5] MTA SZTE Res Grp Artificial Intelligence, H-6720 Szeged, Hungary

[6] Technion Israel Inst Technol, Dept Elect Engn, IL-32000 Haifa, Israel

来源：

INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 37 | 2015年 / 37卷

关键词：

BOUNDS;

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We formalize and study the multi-armed bandit (MAB) problem in a generalized stochastic setting, in which rewards are not assumed to be numerical. Instead, rewards are measured on a qualitative scale that allows for comparison but invalidates arithmetic operations such as averaging. Correspondingly, instead of characterizing an arm in terms of the mean of the underlying distribution, we opt for using a quantile of that distribution as a representative value. We address the problem of quantile-based online learning both for the case of a finite (pure exploration) and infinite time horizon (cumulative regret minimization). For both cases, we propose suitable algorithms and analyze their properties. These properties are also illustrated by means of first experimental studies.

引用

页码：1660 / 1668

页数：9

共 50 条

[21] Multi-armed bandits with episode context
Christopher D. Rosin
Annals of Mathematics and Artificial Intelligence, 2011, 61 : 203 - 230
[22] MULTI-ARMED BANDITS AND THE GITTINS INDEX
WHITTLE, P
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1980, 42 (02): : 143 - 149
[23] Multi-armed bandits with switching penalties
Asawa, M
Teneketzis, D
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1996, 41 (03) : 328 - 348
[24] On Optimal Foraging and Multi-armed Bandits
Srivastava, Vaibhav
Reverdy, Paul
Leonard, Naomi E.
2013 51ST ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2013, : 494 - 499
[25] Active Learning in Multi-armed Bandits
Antos, Andras
Grover, Varun
Szepesvari, Csaba
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 287 - +
[26] Multi-Armed Bandits with Cost Subsidy
Sinha, Deeksha
Sankararama, Karthik Abinav
Kazerouni, Abbas
Avadhanula, Vashist
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[27] Multi-Armed Bandits With Correlated Arms
Gupta, Samarth
Chaudhari, Shreyas
Joshi, Gauri
Yagan, Osman
IEEE TRANSACTIONS ON INFORMATION THEORY, 2021, 67 (10) : 6711 - 6732
[28] Batched Multi-armed Bandits Problem
Gao, Zijun
Han, Yanjun
Ren, Zhimei
Zhou, Zhengqing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[29] Are Multi-Armed Bandits Susceptible to Peeking?
Loecher, Markus
ZAGREB INTERNATIONAL REVIEW OF ECONOMICS & BUSINESS, 2018, 21 (01): : 95 - 104
[30] Secure Outsourcing of Multi-Armed Bandits
Ciucanu, Radu
Lafourcade, Pascal
Lombard-Platet, Marius
Soare, Marta
2020 IEEE 19TH INTERNATIONAL CONFERENCE ON TRUST, SECURITY AND PRIVACY IN COMPUTING AND COMMUNICATIONS (TRUSTCOM 2020), 2020, : 202 - 209

← 1 2 3 4 5 →