Quantum greedy algorithms for multi-armed bandits

被引：0

作者：

Ohno, Hiroshi ^{[1
]}

机构：

[1] Toyota Cent Res & Dev Labs Inc, 41-1 Yokomichi, Nagakute, Aichi 4801192, Japan

来源：

QUANTUM INFORMATION PROCESSING | 2023年 / 22卷 / 02期

关键词：

Multi-armed bandits; -greedy algorithm; MovieLens dataset; Quantum amplitude amplification; Regret analysis;

D O I：

10.1007/s11128-023-03844-2

中图分类号：

O4 [物理学];

学科分类号：

0702 ;

摘要：

Multi-armed bandits are widely used in machine learning applications such as recommendation systems. Here, we implement two quantum versions of the e-greedy algorithm, a popular algorithm for multi-armed bandits. One of the quantum greedy algorithms uses a quantum maximization algorithm and the other is a simple algorithm that uses an amplitude encoding method as a quantum subroutine instead of the argmax operation in the e-greedy algorithm. For the former algorithm, given a quantum oracle, the query complexity is on the order root K (O(root K)) in each round, where K is the number of arms. For the latter algorithm, quantum parallelism is achieved by the quantum superposition of the arms and the run-time complexity is on the order O(K)/O(log K) in each round. Bernoulli reward distributions and the MovieLens dataset are used to evaluate the algorithms with their classical counterparts. The experimental results show that for best arm identification, the performance of the quantum greedy algorithm is comparable with that of the classical counterparts.

引用

页数：20

共 50 条

[31] Multi-Armed Bandits With Costly Probes
Elumar, Eray Can
Tekin, Cem
Yagan, Osman
IEEE TRANSACTIONS ON INFORMATION THEORY, 2025, 71 (01) : 618 - 643
[32] On Optimal Foraging and Multi-armed Bandits
Srivastava, Vaibhav
Reverdy, Paul
Leonard, Naomi E.
2013 51ST ANNUAL ALLERTON CONFERENCE ON COMMUNICATION, CONTROL, AND COMPUTING (ALLERTON), 2013, : 494 - 499
[33] MULTI-ARMED BANDITS AND THE GITTINS INDEX
WHITTLE, P
JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1980, 42 (02): : 143 - 149
[34] Multi-armed bandits with episode context
Christopher D. Rosin
Annals of Mathematics and Artificial Intelligence, 2011, 61 : 203 - 230
[35] Multi-armed bandits with switching penalties
Asawa, M
Teneketzis, D
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, 1996, 41 (03) : 328 - 348
[36] Active Learning in Multi-armed Bandits
Antos, Andras
Grover, Varun
Szepesvari, Csaba
ALGORITHMIC LEARNING THEORY, PROCEEDINGS, 2008, 5254 : 287 - +
[37] Multi-Armed Bandits With Correlated Arms
Gupta, Samarth
Chaudhari, Shreyas
Joshi, Gauri
Yagan, Osman
IEEE TRANSACTIONS ON INFORMATION THEORY, 2021, 67 (10) : 6711 - 6732
[38] Multi-Armed Bandits with Cost Subsidy
Sinha, Deeksha
Sankararama, Karthik Abinav
Kazerouni, Abbas
Avadhanula, Vashist
24TH INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS (AISTATS), 2021, 130
[39] Batched Multi-armed Bandits Problem
Gao, Zijun
Han, Yanjun
Ren, Zhimei
Zhou, Zhengqing
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[40] Are Multi-Armed Bandits Susceptible to Peeking?
Loecher, Markus
ZAGREB INTERNATIONAL REVIEW OF ECONOMICS & BUSINESS, 2018, 21 (01): : 95 - 104

← 1 2 3 4 5 →