Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization

被引:0
作者
Maiti, Arnab [1 ,3 ]
Patil, Vishakha [2 ]
Khan, Arindam [2 ]
机构
[1] Indian Inst Technol, Kharagpur, W Bengal, India
[2] Indian Inst Sci, Bangalore, Karnataka, India
[3] IISc, Bangalore, Karnataka, India
来源
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight guarantee. We show Omega(T-2/3) cumulative regret in expectation for single-pass algorithms for arm-memory size of (n = 1), where n is the number of arms. For best-arm identification, we provide an (epsilon, delta)-PAC algorithm with arm-memory size of O (log* n) and O(n/epsilon(2) . log(1/delta)) optimal sample complexity.
引用
收藏
页数:13
相关论文
共 35 条
  • [11] Chaurasia AR, 2020, POPULATION AND SUSTAINABLE DEVELOPMENT IN INDIA, P85, DOI 10.1007/978-981-32-9212-3_5
  • [12] A NOTE ON THE 2-ARMED BANDIT PROBLEM WITH FINITE MEMORY
    COVER, TM
    [J]. INFORMATION AND CONTROL, 1968, 12 (5-6): : 371 - &
  • [13] Dagan Y., 2018, C LEARN THEOR, P1145
  • [14] de Heide Rianne, 2021, ARXIV210312452
  • [15] Even-Dar E., 2002, Computational Learning Theory. 15th Annual Conference on Computational Learning Theory, COLT 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2375), P255
  • [16] Falahatgar M, 2020, PR MACH LEARN RES, V119
  • [17] Gomes Heitor Murilo, 2019, ACM SIGKDD Explorations Newsletter, V21, P6, DOI 10.1145/3373464.3373470
  • [18] Policies without memory for the infinite-armed Bernoulli bandit under the average-reward criterion
    Herschkorn, SJ
    Pekoz, E
    Ross, SM
    [J]. PROBABILITY IN THE ENGINEERING AND INFORMATIONAL SCIENCES, 1996, 10 (01) : 21 - 28
  • [19] Jin T., 2021, INT C MACHINE LEARN, P5045
  • [20] Kalvit Anand, 2021, ARXIV210510721