Multi-Armed Bandits with Bounded Arm-Memory: Near-Optimal Guarantees for Best-Arm Identification and Regret Minimization

被引：0

作者：

Maiti, Arnab ^{[1
,3
]}

Patil, Vishakha ^{[2
]}

Khan, Arindam ^{[2
]}

机构：

[1] Indian Inst Technol, Kharagpur, W Bengal, India

[2] Indian Inst Sci, Bangalore, Karnataka, India

[3] IISc, Bangalore, Karnataka, India

来源：

ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 34 (NEURIPS 2021) | 2021年 / 34卷

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We study the Stochastic Multi-armed Bandit problem under bounded arm-memory. In this setting, the arms arrive in a stream, and the number of arms that can be stored in the memory at any time, is bounded. The decision-maker can only pull arms that are present in the memory. We address the problem from the perspective of two standard objectives: 1) regret minimization, and 2) best-arm identification. For regret minimization, we settle an important open question by showing an almost tight guarantee. We show Omega(T-2/3) cumulative regret in expectation for single-pass algorithms for arm-memory size of (n = 1), where n is the number of arms. For best-arm identification, we provide an (epsilon, delta)-PAC algorithm with arm-memory size of O (log* n) and O(n/epsilon(2) . log(1/delta)) optimal sample complexity.

引用

页数：13

共 35 条

[1] Aggarwal CC, 2007, KDD-2007 PROCEEDINGS OF THE THIRTEENTH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, P36
[2] Alon N., 1996, Proceedings of the Twenty-Eighth Annual ACM Symposium on the Theory of Computing, P20, DOI 10.1145/237814.237823
[3] [Anonymous], 2005, DATA STREAMS ALGORIT
[4] [Anonymous], 2017, C LEARN THEOR, DOI DOI 10.1109/FICLOUD.2017.37
[5] Exploration with Limited Memory: Streaming Algorithms for Coin Tossing, Noisy Comparisons, and Multi-armed Bandits
Assadi, Sepehr
Wang, Chen
[J]. PROCEEDINGS OF THE 52ND ANNUAL ACM SIGACT SYMPOSIUM ON THEORY OF COMPUTING (STOC '20), 2020, : 1237 - 1250
[6] Finite-time analysis of the multiarmed bandit problem
Auer, P
Cesa-Bianchi, N
Fischer, P
[J]. MACHINE LEARNING, 2002, 47 (2-3) : 235 - 256
[7] Bandit problems with infinitely many arms
Berry, DA
Chen, RW
Zame, A
Heath, DC
Shepp, LA
[J]. ANNALS OF STATISTICS, 1997, 25 (05) : 2103 - 2116
[8] Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems
Bubeck, Sebastien
Cesa-Bianchi, Nicolo
[J]. FOUNDATIONS AND TRENDS IN MACHINE LEARNING, 2012, 5 (01): : 1 - 122
[9] Carvalho V. R., 2006, Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, P548
[10] Chakravorty J., 2014, Methods and applications of statistics in clinical trials: Planning, analysis, and inferential methods, V2, P416

← 1 2 3 4 →