Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization

被引:0
作者
Maiti, Arnab [1 ,3 ]
Patil, Vishakha [2 ]
Khan, Arindam [2 ]
机构
[1] Indian Inst Technol, Kharagpur, W Bengal, India
[2] Indian Inst Sci, Bangalore, Karnataka, India
[3] IISc, Bangalore, Karnataka, India
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight guarantee. We show Omega(T-2/3) cumulative regret in expectation for single-pass algorithms for arm-memory size of (n = 1), where n is the number of arms. For best-arm identification, we provide an (epsilon, delta)-PAC algorithm with arm-memory size of O (log* n) and O(n/epsilon(2) . log(1/delta)) optimal sample complexity.
引用
收藏
页数:13
相关论文
共 35 条
  • [1] Aggarwal CC, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P36
  • [2] Alon N., 1996, Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, P20, DOI 10.1145/237814.237823
  • [3] [Anonymous], 2005, DATA STREAMS ALGORIT
  • [4] [Anonymous], 2017, C LEARN THEOR, DOI DOI 10.1109/FICLOUD.2017.37
  • [5] Exploration with Limited Memory: Streaming Algorithms for Coin Tossing, Noisy Comparisons, and Multi-armed Bandits
    Assadi, Sepehr
    Wang, Chen
    [J]. PROCEEDINGS OF THE 52ND ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '20), 2020, : 1237 - 1250
  • [6] Finite-time analysis of the multiarmed bandit problem
    Auer, P
    Cesa-Bianchi, N
    Fischer, P
    [J]. MACHINE LEARNING, 2002, 47 (2-3) : 235 - 256
  • [7] Bandit problems with infinitely many arms
    Berry, DA
    Chen, RW
    Zame, A
    Heath, DC
    Shepp, LA
    [J]. ANNALS OF STATISTICS, 1997, 25 (05) : 2103 - 2116
  • [8] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
    Bubeck, Sebastien
    Cesa-Bianchi, Nicolo
    [J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2012, 5 (01): : 1 - 122
  • [9] Carvalho V. R., 2006, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, P548
  • [10] Chakravorty J., 2014, Methods and applications of statistics in clinical trials: Planning, analysis, and inferential methods, V2, P416