Budgeted Multi-Armed Bandit in Continuous Action Space

被引:7
作者
Trovo, Francesco [1 ]
Paladino, Stefano [1 ]
Restelli, Marcello [1 ]
Gatti, Nicola [1 ]
机构
[1] Politecn Milan, Milan, Italy
来源
ECAI 2016: 22ND EUROPEAN CONFERENCE ON ARTIFICIAL INTELLIGENCE | 2016年 / 285卷
关键词
D O I
10.3233/978-1-61499-672-9-560
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Multi-Armed Bandits (MABs) have been widely considered in the last decade to model settings in which an agent wants to learn the action providing the highest expected reward among a fixed set of available actions during the operational life of a system. Classical techniques provide solutions that minimize the regret due to learning in settings where selecting an arm has no cost. Though, in many real world applications the learner has to pay some cost for pulling each arm and the learning process is constrained by a fixed budget B. This problem is addressed in the literature as the Budgeted MAB (BMAB). In this paper, for the first time, we study the problem of Budgeted Continuous-Armed Bandit (BCAB), where the set of the possible actions consists in a continuous set (e.g., a range of prices) and the learner suffers from a random reward and cost at each round. We provide a novel algorithm, named B-Zoom, which suffers a regret of (O) over tilde (Bd+1/d+2), where d is the Zooming dimension of the problem. Finally, we provide an empirical analysis showing that, despite a lower average performance, the proposed approach is more robust to adverse settings as compared to existing algorithms designed for BMAB.
引用
收藏
页码:560 / 568
页数:9
相关论文
共 50 条
[41]   An Adaptive Algorithm in Multi-Armed Bandit Problem [J].
Zhang X. ;
Zhou Q. ;
Liang B. ;
Xu J. .
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2019, 56 (03) :643-654
[42]   UCB-SC: A Fast Variant of KL-UCB-SC for Budgeted Multi-Armed Bandit Problem [J].
Watanabe, Ryo ;
Komiyama, Junpei ;
Nakamura, Atsuyoshi ;
Kudo, Mineichi .
IEICE TRANSACTIONS ON FUNDAMENTALS OF ELECTRONICS COMMUNICATIONS AND COMPUTER SCIENCES, 2018, E101A (03) :662-667
[43]   A Multi-Armed Bandit Approach for Test Case Prioritization in Continuous Integration Environments [J].
Lima, Jackson A. Prado ;
Vergilio, Silvia Regina .
IEEE TRANSACTIONS ON SOFTWARE ENGINEERING, 2022, 48 (02) :453-465
[44]   Multi-armed bandit problem with known trend [J].
Bouneffouf, Djallel ;
Feraud, Raphael .
NEUROCOMPUTING, 2016, 205 :16-21
[45]   A Multi-Armed Bandit Hyper-Heuristic [J].
Ferreira, Alexandre Silvestre ;
Goncalves, Richard Aderbal ;
Ramirez Pozo, Aurora Trinidad .
2015 BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS 2015), 2015, :13-18
[46]   Ambiguity aversion in multi-armed bandit problems [J].
Christopher M. Anderson .
Theory and Decision, 2012, 72 :15-33
[47]   Variational inference for the multi-armed contextual bandit [J].
Urteaga, Inigo ;
Wiggins, Chris H. .
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84
[48]   Bridging Adversarial and Nonstationary Multi-Armed Bandit [J].
Chen, Ningyuan ;
Yang, Shuoguang ;
Zhang, Hailun .
PRODUCTION AND OPERATIONS MANAGEMENT, 2025, 34 (08) :2218-2231
[49]   Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems [J].
Even-Dar, Eyal ;
Mannor, Shie ;
Mansour, Yishay .
JOURNAL OF MACHINE LEARNING RESEARCH, 2006, 7 :1079-1105
[50]   A Differentially Private Approach for Budgeted Combinatorial Multi-Armed Bandits [J].
Wang, Hengzhi ;
Cui, Laizhong ;
Wang, En ;
Liu, Jiangchuan .
IEEE TRANSACTIONS ON DEPENDABLE AND SECURE COMPUTING, 2025, 22 (01) :424-439