Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization

被引：0

作者：

Maiti, Arnab ^{[1
,3
]}

Patil, Vishakha ^{[2
]}

Khan, Arindam ^{[2
]}

机构：

[1] Indian Inst Technol, Kharagpur, W Bengal, India

[2] Indian Inst Sci, Bangalore, Karnataka, India

[3] IISc, Bangalore, Karnataka, India

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight guarantee. We show Omega(T-2/3) cumulative regret in expectation for single-pass algorithms for arm-memory size of (n = 1), where n is the number of arms. For best-arm identification, we provide an (epsilon, delta)-PAC algorithm with arm-memory size of O (log* n) and O(n/epsilon(2) . log(1/delta)) optimal sample complexity.

引用

页数：13

共 35 条

[11] Chaurasia AR, 2020, POPULATION AND SUSTAINABLE DEVELOPMENT IN INDIA, P85, DOI 10.1007/978-981-32-9212-3_5
[12] A NOTE ON THE 2-ARMED BANDIT PROBLEM WITH FINITE MEMORY
COVER, TM
[J]. INFORMATION AND CONTROL, 1968, 12 (5-6): : 371 - &
[13] Dagan Y., 2018, C LEARN THEOR, P1145
[14] de Heide Rianne, 2021, ARXIV210312452
[15] Even-Dar E., 2002, Computational Learning Theory. 15th Annual Conference on Computational Learning Theory, COLT 2002. Proceedings (Lecture Notes in Artificial Intelligence Vol.2375), P255
[16] Falahatgar M, 2020, PR MACH LEARN RES, V119
[17] Gomes Heitor Murilo, 2019, ACM SIGKDD Explorations Newsletter, V21, P6, DOI 10.1145/3373464.3373470
[18] Policies without memory for the infinite-armed Bernoulli bandit under the average-reward criterion
Herschkorn, SJ
Pekoz, E
Ross, SM
[J]. PROBABILITY IN THE ENGINEERING AND INFORMATIONAL SCIENCES, 1996, 10 (01) : 21 - 28
[19] Jin T., 2021, INT C MACHINE LEARN, P5045
[20] Kalvit Anand, 2021, ARXIV210510721

← 1 2 3 4 →