Budgeted Multi-Armed Bandit in Continuous Action Space

被引：6

作者：

Trovo, Francesco ^{[1
]}

Paladino, Stefano ^{[1
]}

Restelli, Marcello ^{[1
]}

Gatti, Nicola ^{[1
]}

机构：

[1] Politecn Milan, Milan, Italy

来源：

ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2016年 / 285卷

关键词：

D O I：

10.3233/978-1-61499-672-9-560

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-Armed Bandits (MABs) have been widely considered in the last decade to model settings in which an agent wants to learn the action providing the highest expected reward among a fixed set of available actions during the operational life of a system. Classical techniques provide solutions that minimize the regret due to learning in settings where selecting an arm has no cost. Though, in many real world applications the learner has to pay some cost for pulling each arm and the learning process is constrained by a fixed budget B. This problem is addressed in the literature as the Budgeted MAB (BMAB). In this paper, for the first time, we study the problem of Budgeted Continuous-Armed Bandit (BCAB), where the set of the possible actions consists in a continuous set (e.g., a range of prices) and the learner suffers from a random reward and cost at each round. We provide a novel algorithm, named B-Zoom, which suffers a regret of (O) over tilde (Bd+1/d+2), where d is the Zooming dimension of the problem. Finally, we provide an empirical analysis showing that, despite a lower average performance, the proposed approach is more robust to adverse settings as compared to existing algorithms designed for BMAB.

引用

页码：560 / 568

页数：9

共 50 条

[1] The budgeted multi-armed bandit problem
Madani, O
Lizotte, DJ
Greiner, R
LEARNING THEORY, PROCEEDINGS, 2004, 3120 : 643 - 645
[2] The multi-armed bandit, with constraints
Eric V. Denardo
Eugene A. Feinberg
Uriel G. Rothblum
Annals of Operations Research, 2013, 208 : 37 - 62
[3] The multi-armed bandit, with constraints
Denardo, Eric V.
Feinberg, Eugene A.
Rothblum, Uriel G.
ANNALS OF OPERATIONS RESEARCH, 2013, 208 (01) : 37 - 62
[4] The Assistive Multi-Armed Bandit
Chan, Lawrence
Hadfield-Menell, Dylan
Srinivasa, Siddhartha
Dragan, Anca
HRI '19: 2019 14TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION, 2019, : 354 - 363
[5] Multi-armed bandit games
Gursoy, Kemal
ANNALS OF OPERATIONS RESEARCH, 2024,
[6] Synchronization and optimality for multi-armed bandit problems in continuous time
ElKaroui, N
Karatzas, I
COMPUTATIONAL & APPLIED MATHEMATICS, 1997, 16 (02): : 117 - 151
[7] Thompson Sampling for Budgeted Multi-armed Bandits
Xia, Yingce
Li, Haifang
Qin, Tao
Yu, Nenghai
Liu, Tie-Yan
PROCEEDINGS OF THE TWENTY-FOURTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI), 2015, : 3960 - 3966
[8] Dynamic Multi-Armed Bandit with Covariates
Pavlidis, Nicos G.
Tasoulis, Dimitris K.
Adams, Niall M.
Hand, David J.
ECAI 2008, PROCEEDINGS, 2008, 178 : 777 - +
[9] Scaling Multi-Armed Bandit Algorithms
Fouche, Edouard
Komiyama, Junpei
Boehm, Klemens
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, : 1449 - 1459
[10] The Multi-Armed Bandit With Stochastic Plays
Lesage-Landry, Antoine
Taylor, Joshua A.
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 2018, 63 (07) : 2280 - 2286

← 1 2 3 4 5 →