Non-Compliant Bandits

被引：0

作者：

Kveton, Branislav ^{[1
]}

Liu, Yi ^{[1
]}

Kruijssen, Johan Matteo ^{[1
]}

Nie, Yisu ^{[1
]}

机构：

[1] Amazon, Seattle, WA 98109 USA

来源：

PROCEEDINGS OF THE 32ND ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2023 | 2023年

关键词：

bandits; exploration; online learning; reinforcement learning; active learning;

D O I：

10.1145/3583780.3614990

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Bandit algorithms arose as a standard approach to learning better models online. As they become more popular, they are increasingly deployed in complex machine learning pipelines, where their actions can be overwritten. For example, in ranking problems, a list of recommended items can be modified by a downstream algorithm to increase diversity. This may break the classic bandit algorithms and lead to linear regret. Specifically, if the proposed action is not taken, uncertainty in its estimated mean reward may not get reduced. In this work, we study this setting and call it non-compliant bandits; as the agent tries to learn rewarding actions that comply with a down-stream task. We propose two algorithms, compliant contextual UCB (CompUCB) and Thompson sampling (CompTS), which learn separate reward and compliance models. The compliance model allows the agent to avoid non-compliant actions. We derive a sublinear regret bound for CompUCB. We also conduct experiments that compare our algorithms to classic bandit baselines. The experiments show failures of the baselines and that we mitigate them by learning compliance models.

引用

页码：1138 / 1147

页数：10

共 50 条

[1] Non-compliant
Jack, C
NEW SCIENTIST, 1999, 161 (2179) : 58 - 58
[2] The Non-Compliant Patient
不详
DIALYSIS & TRANSPLANTATION, 2009, 38 (03) : 78 - 79
[3] NON-COMPLIANT, OR ILLITERATE
ROSSOF, AH
LANCET, 1988, 1 (8581): : 362 - 362
[4] NON-COMPLIANT PATIENT
FERGUSON, T
JOURNAL OF THE TENNESSEE MEDICAL ASSOCIATION, 1977, 70 (04): : 248 - &
[5] REGULATIONS Non-compliant chemicals
Burke, Maria
CHEMISTRY & INDUSTRY, 2018, 82 (09) : 5 - 5
[6] Sometimes non-compliant is okay
Eng syst, 3 (20):
[7] PREDICTING NON-COMPLIANT BEHAVIOR
DAVIS, MS
JOURNAL OF HEALTH AND SOCIAL BEHAVIOR, 1967, 8 (04) : 265 - 271
[8] How make a non-compliant HIV patient compliant
Pennel, MP
Bocktaels, C
Ajana, F
Yazdanpanah, Y
Mouton, Y
MEDECINE ET MALADIES INFECTIEUSES, 2003, 33 : 86 - 87
[9] Non-compliant companies will be in line for prosecution
不详
ELECTRONICS WORLD, 2006, 112 (1837): : 6 - 6
[10] Strategies for motivating the non-compliant patient
Freeman, R
BRITISH DENTAL JOURNAL, 1999, 187 (06) : 307 - 312

← 1 2 3 4 5 →