Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization

被引:0
作者
Maiti, Arnab [1 ,3 ]
Patil, Vishakha [2 ]
Khan, Arindam [2 ]
机构
[1] Indian Inst Technol, Kharagpur, W Bengal, India
[2] Indian Inst Sci, Bangalore, Karnataka, India
[3] IISc, Bangalore, Karnataka, India
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight guarantee. We show Omega(T-2/3) cumulative regret in expectation for single-pass algorithms for arm-memory size of (n = 1), where n is the number of arms. For best-arm identification, we provide an (epsilon, delta)-PAC algorithm with arm-memory size of O (log* n) and O(n/epsilon(2) . log(1/delta)) optimal sample complexity.
引用
收藏
页数:13
相关论文
共 35 条
  • [21] Karlin A, 2015, AAAI CONF ARTIF INTE, P944
  • [22] Bandits and Experts in Metric Spaces
    Kleinberg, Robert
    Slivkins, Aleksandrs
    Upfal, Eli
    [J]. JOURNAL OF THE ACM, 2019, 66 (04)
  • [23] Liau D., 2018, International Conference on Artificial Intelligence and Statistics, V84, P386
  • [24] Long TT, 2014, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, P809
  • [25] Efficient crowdsourcing of unknown experts using bounded multi-armed bandits
    Long Tran-Thanh
    Stein, Sebastian
    Rogers, Alex
    Jennings, Nicholas R.
    [J]. ARTIFICIAL INTELLIGENCE, 2014, 214 : 89 - 111
  • [26] Lu CJ, 2011, LECT NOTES ARTIF INT, V6925, P249, DOI 10.1007/978-3-642-24412-4_21
  • [27] Pekoz E. A., 2003, Journal of Applied Probability, V40, P250, DOI 10.1239/jap/1044476838
  • [28] Rai P, 2009, 21ST INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-09), PROCEEDINGS, P1211
  • [29] SOME ASPECTS OF THE SEQUENTIAL DESIGN OF EXPERIMENTS
    ROBBINS, H
    [J]. BULLETIN OF THE AMERICAN MATHEMATICAL SOCIETY, 1952, 58 (05) : 527 - 535
  • [30] Introduction to Multi-Armed Bandits Preface
    Slivkins, Aleksandrs
    [J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2019, 12 (1-2): : 1 - 286