共 50 条
Bandits with Knapsacks and Predictions
被引:0
|作者:
Drago, Davide
[1
]
Celli, Andrea
[1
]
Elias, Marek
[1
]
机构:
[1] Bocconi Univ, Milan, Italy
来源:
关键词:
D O I:
暂无
中图分类号:
学科分类号:
摘要:
We study the Bandits with Knapsacks problem with the aim of designing a learning-augmented online learning algorithm upholding better regret guarantees than the state-of-the-art primal-dual algorithms with worst-case guarantees, under both stochastic and adversarial inputs. In the adversarial case, we obtain better competitive ratios when the input predictions are accurate, while also maintaining worst-case guarantees for imprecise predictions. We introduce two algorithms tailored for the full and bandit feedback settings, respectively. Both algorithms integrate a static prediction with a worst-case no-a-regret algorithm. This yields an optimized competitive ratio of (pi + (1 - pi)/alpha)(-1) in scenarios where the prediction is perfect, and a competitive ratio of alpha/(1-pi) in the case of highly imprecise predictions, where pi is an element of(0, 1) is chosen by the learner and alpha is Slater's parameter. We complement this analysis by studying the stochastic setting under full feedback. We provide an algorithm which guarantees a pseudo-regret of (O) over tilde (root T) with poor predictions, and 0 pseudo-regret with perfect predictions.
引用
收藏
页码:1189 / 1206
页数:18
相关论文