Budgeted Multi-Armed Bandit in Continuous Action Space

被引:7
作者
Trovo, Francesco [1 ]
Paladino, Stefano [1 ]
Restelli, Marcello [1 ]
Gatti, Nicola [1 ]
机构
[1] Politecn Milan, Milan, Italy
来源
ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2016年 / 285卷
关键词
D O I
10.3233/978-1-61499-672-9-560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-Armed Bandits (MABs) have been widely considered in the last decade to model settings in which an agent wants to learn the action providing the highest expected reward among a fixed set of available actions during the operational life of a system. Classical techniques provide solutions that minimize the regret due to learning in settings where selecting an arm has no cost. Though, in many real world applications the learner has to pay some cost for pulling each arm and the learning process is constrained by a fixed budget B. This problem is addressed in the literature as the Budgeted MAB (BMAB). In this paper, for the first time, we study the problem of Budgeted Continuous-Armed Bandit (BCAB), where the set of the possible actions consists in a continuous set (e.g., a range of prices) and the learner suffers from a random reward and cost at each round. We provide a novel algorithm, named B-Zoom, which suffers a regret of (O) over tilde (Bd+1/d+2), where d is the Zooming dimension of the problem. Finally, we provide an empirical analysis showing that, despite a lower average performance, the proposed approach is more robust to adverse settings as compared to existing algorithms designed for BMAB.
引用
收藏
页码:560 / 568
页数:9
相关论文
共 50 条
[31]   Achieving Privacy in the Adversarial Multi-Armed Bandit [J].
Tossou, Aristide C. Y. ;
Dimitrakakis, Christos .
THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, 2017, :2653-2659
[32]   Generic Outlier Detection in Multi-Armed Bandit [J].
Ban, Yikun ;
He, Jingrui .
KDD '20: PROCEEDINGS OF THE 26TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2020, :913-923
[33]   Anytime Algorithms for Multi-Armed Bandit Problems [J].
Kleinberg, Robert .
PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, :928-936
[34]   Multi-Armed Recommender System Bandit Ensembles [J].
Canamares, Rocio ;
Redondo, Marcos ;
Castells, Pablo .
RECSYS 2019: 13TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, 2019, :432-436
[35]   Multi-armed Bandit Mechanism with Private Histories [J].
Liu, Chang ;
Cai, Qingpeng ;
Zhang, Yukui .
AAMAS'17: PROCEEDINGS OF THE 16TH INTERNATIONAL CONFERENCE ON AUTONOMOUS AGENTS AND MULTIAGENT SYSTEMS, 2017, :1607-1609
[36]   Robust control of the multi-armed bandit problem [J].
Felipe Caro ;
Aparupa Das Gupta .
Annals of Operations Research, 2022, 317 :461-480
[37]   Noise Free Multi-armed Bandit Game [J].
Nakamura, Atsuyoshi ;
Helmbold, David P. ;
Warmuth, Manfred K. .
LANGUAGE AND AUTOMATA THEORY AND APPLICATIONS, LATA 2016, 2016, 9618 :412-423
[38]   Ambiguity aversion in multi-armed bandit problems [J].
Anderson, Christopher M. .
THEORY AND DECISION, 2012, 72 (01) :15-33
[39]   CHARACTERIZING TRUTHFUL MULTI-ARMED BANDIT MECHANISMS [J].
Babaioff, Moshe ;
Sharma, Yogeshwer ;
Slivkins, Aleksandrs .
SIAM JOURNAL ON COMPUTING, 2014, 43 (01) :194-230
[40]   Multi-armed Bandit Problems with Strategic Arms [J].
Braverman, Mark ;
Mao, Jieming ;
Schneider, Jon ;
Weinberg, S. Matthew .
CONFERENCE ON LEARNING THEORY, VOL 99, 2019, 99