Lenient Regret for Multi-Armed Bandits

被引：0

作者：

Merlis, Nadav ^{[1
]}

Mannor, Shie ^{[1
,2
]}

机构：

[1] Technion Israel Inst Technol, Haifa, Israel

[2] Nvidia Res, Tel Aviv, Israel

来源：

THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2021年 / 35卷

基金：

以色列科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results. For example, in large problems, or when the interaction with the environment is brief, finding an optimal arm is infeasible, and regret-minimizing algorithms tend to over-explore. To overcome this issue, algorithms for such settings should instead focus on playing near-optimal arms. To this end, we suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some E. We then present a variant of the Thompson Sampling (TS) algorithm, called (sic)-TS, and prove its asymptotic optimality in terms of the lenient regret. Importantly, we show that when the mean of the optimal arm is high enough, the lenient regret of (sic)-TS is bounded by a constant. Finally, we show that (sic)-TS can be applied to improve the performance when the agent knows a lower bound of the suboptimality gaps.

引用

页码：8950 / 8957

页数：8

共 50 条

[1] Fairness and Welfare Quantification for Regret in Multi-Armed Bandits
Barman, Siddharth
Khan, Arindam
Maiti, Arnab
Sawarni, Ayush
THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 6762 - 6769
[2] Bounded Regret for Finitely Parameterized Multi-Armed Bandits
Panaganti, Kishan
Kalathil, Dileep
IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (03): : 1073 - 1078
[3] Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits
Bar-On, Yogev
Mansour, Yishay
ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
[4] Constrained regret minimization for multi-criterion multi-armed bandits
Kagrecha, Anmol
Nair, Jayakrishnan
Jagannathan, Krishna
MACHINE LEARNING, 2023, 112 (02) : 431 - 458
[5] Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk
Chen, Tianrui
Gangrade, Aditya
Saligrama, Venkatesh
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
[6] Constrained regret minimization for multi-criterion multi-armed bandits
Anmol Kagrecha
Jayakrishnan Nair
Krishna Jagannathan
Machine Learning, 2023, 112 : 431 - 458
[7] Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits
Karpov, Nikolai
Zhang, Qin
THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13076 - 13084
[8] Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory
Chaudhuri, Arghya Roy
Kalyanakrishnan, Shivaram
THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10085 - 10092
[9] On Kernelized Multi-armed Bandits
Chowdhury, Sayak Ray
Gopalan, Aditya
INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
[10] Regional Multi-Armed Bandits
Wang, Zhiyang
Zhou, Ruida
Shen, Cong
INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84

← 1 2 3 4 5 →