Lenient Regret for Multi-Armed Bandits

被引:0
|
作者
Merlis, Nadav [1 ]
Mannor, Shie [1 ,2 ]
机构
[1] Technion Israel Inst Technol, Haifa, Israel
[2] Nvidia Res, Tel Aviv, Israel
来源
THIRTY-FIFTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THIRTY-THIRD CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE AND THE ELEVENTH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2021年 / 35卷
基金
以色列科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We consider the Multi-Armed Bandit (MAB) problem, where an agent sequentially chooses actions and observes rewards for the actions it took. While the majority of algorithms try to minimize the regret, i.e., the cumulative difference between the reward of the best action and the agent's action, this criterion might lead to undesirable results. For example, in large problems, or when the interaction with the environment is brief, finding an optimal arm is infeasible, and regret-minimizing algorithms tend to over-explore. To overcome this issue, algorithms for such settings should instead focus on playing near-optimal arms. To this end, we suggest a new, more lenient, regret criterion that ignores suboptimality gaps smaller than some E. We then present a variant of the Thompson Sampling (TS) algorithm, called (sic)-TS, and prove its asymptotic optimality in terms of the lenient regret. Importantly, we show that when the mean of the optimal arm is high enough, the lenient regret of (sic)-TS is bounded by a constant. Finally, we show that (sic)-TS can be applied to improve the performance when the agent knows a lower bound of the suboptimality gaps.
引用
收藏
页码:8950 / 8957
页数:8
相关论文
共 50 条
  • [1] Fairness and Welfare Quantification for Regret in Multi-Armed Bandits
    Barman, Siddharth
    Khan, Arindam
    Maiti, Arnab
    Sawarni, Ayush
    THIRTY-SEVENTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 37 NO 6, 2023, : 6762 - 6769
  • [2] Bounded Regret for Finitely Parameterized Multi-Armed Bandits
    Panaganti, Kishan
    Kalathil, Dileep
    IEEE CONTROL SYSTEMS LETTERS, 2021, 5 (03): : 1073 - 1078
  • [3] Individual Regret in Cooperative Nonstochastic Multi-Armed Bandits
    Bar-On, Yogev
    Mansour, Yishay
    ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 32 (NIPS 2019), 2019, 32
  • [4] Constrained regret minimization for multi-criterion multi-armed bandits
    Kagrecha, Anmol
    Nair, Jayakrishnan
    Jagannathan, Krishna
    MACHINE LEARNING, 2023, 112 (02) : 431 - 458
  • [5] Strategies for Safe Multi-Armed Bandits with Logarithmic Regret and Risk
    Chen, Tianrui
    Gangrade, Aditya
    Saligrama, Venkatesh
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 162, 2022,
  • [6] Constrained regret minimization for multi-criterion multi-armed bandits
    Anmol Kagrecha
    Jayakrishnan Nair
    Krishna Jagannathan
    Machine Learning, 2023, 112 : 431 - 458
  • [7] Communication-Efficient Collaborative Regret Minimization in Multi-Armed Bandits
    Karpov, Nikolai
    Zhang, Qin
    THIRTY-EIGHTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, VOL 38 NO 12, 2024, : 13076 - 13084
  • [8] Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory
    Chaudhuri, Arghya Roy
    Kalyanakrishnan, Shivaram
    THIRTY-FOURTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE, THE THIRTY-SECOND INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE CONFERENCE AND THE TENTH AAAI SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2020, 34 : 10085 - 10092
  • [9] On Kernelized Multi-armed Bandits
    Chowdhury, Sayak Ray
    Gopalan, Aditya
    INTERNATIONAL CONFERENCE ON MACHINE LEARNING, VOL 70, 2017, 70
  • [10] Regional Multi-Armed Bandits
    Wang, Zhiyang
    Zhou, Ruida
    Shen, Cong
    INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND STATISTICS, VOL 84, 2018, 84