Budgeted Multi-Armed Bandit in Continuous Action Space

被引：7

作者：

Trovo, Francesco ^{[1
]}

Paladino, Stefano ^{[1
]}

Restelli, Marcello ^{[1
]}

Gatti, Nicola ^{[1
]}

机构：

[1] Politecn Milan, Milan, Italy

来源：

ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2016年 / 285卷

关键词：

D O I：

10.3233/978-1-61499-672-9-560

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-Armed Bandits (MABs) have been widely considered in the last decade to model settings in which an agent wants to learn the action providing the highest expected reward among a fixed set of available actions during the operational life of a system. Classical techniques provide solutions that minimize the regret due to learning in settings where selecting an arm has no cost. Though, in many real world applications the learner has to pay some cost for pulling each arm and the learning process is constrained by a fixed budget B. This problem is addressed in the literature as the Budgeted MAB (BMAB). In this paper, for the first time, we study the problem of Budgeted Continuous-Armed Bandit (BCAB), where the set of the possible actions consists in a continuous set (e.g., a range of prices) and the learner suffers from a random reward and cost at each round. We provide a novel algorithm, named B-Zoom, which suffers a regret of (O) over tilde (Bd+1/d+2), where d is the Zooming dimension of the problem. Finally, we provide an empirical analysis showing that, despite a lower average performance, the proposed approach is more robust to adverse settings as compared to existing algorithms designed for BMAB.

引用

页码：560 / 568

页数：9

共 50 条

[21] ON MULTI-ARMED BANDIT PROBLEM WITH NUISANCE PARAMETER [J].

孙嘉阳 .

Science China Mathematics, 1986, (05) :464-475

[22] Multi-armed bandit algorithms and empirical evaluation [J].

Vermorel, J ;

Mohri, M .

MACHINE LEARNING: ECML 2005, PROCEEDINGS, 2005, 3720 :437-448

[23] Budgeted Multi-Armed Bandits with Asymmetric Confidence Intervals [J].

Heyden, Marco ;

Arzamasov, Vadim ;

Fouche, Edouard ;

Boehm, Klemens .

PROCEEDINGS OF THE 30TH ACM SIGKDD CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, KDD 2024, 2024, :1073-1084

[24] Sustainable Cooperative Coevolution with a Multi-Armed Bandit [J].

De Rainville, Francois-Michel ;

Sebag, Michele ;

Gagne, Christian ;

Schoenauer, Marc ;

Laurendeau, Denis .

GECCO'13: PROCEEDINGS OF THE 2013 GENETIC AND EVOLUTIONARY COMPUTATION CONFERENCE, 2013, :1517-1524

[25] Characterizing Truthful Multi-Armed Bandit Mechanisms [J].

Babaioff, Moshe ;

Sharma, Yogeshwer ;

Slivkins, Aleksandrs .

10TH ACM CONFERENCE ON ELECTRONIC COMMERCE - EC 2009, 2009, :79-88

[26] Identifying Outlier Arms in Multi-Armed Bandit [J].

Zhuang, Honglei ;

Wang, Chi ;

Wang, Yifan .

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 30 (NIPS 2017), 2017, 30

[27] Robust control of the multi-armed bandit problem [J].

Caro, Felipe ;

Das Gupta, Aparupa .

ANNALS OF OPERATIONS RESEARCH, 2022, 317 (02) :461-480

[28] A Multi-Armed Bandit Strategy for Countermeasure Selection [J].

Cochrane, Madeleine ;

Hunjet, Robert .

2020 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (SSCI), 2020, :2510-2515

[29] DBA: Dynamic Multi-Armed Bandit Algorithm [J].

Nobari, Sadegh .

THIRTY-THIRD AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FIRST INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE / NINTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2019, :9869-9870

[30] Percentile optimization in multi-armed bandit problems [J].

Ghatrani, Zahra ;

Ghate, Archis .

ANNALS OF OPERATIONS RESEARCH, 2024, 340 (2-3) :837-862

← 1 2 3 4 5 →