CORRUPTED CONTEXTUAL BANDITS: ONLINE LEARNING WITH CORRUPTED CONTEXT

被引:3
作者
Bouneffouf, Djallel [1 ]
机构
[1] IBM Thomas J Watson Res Ctr, Yorktown Hts, NY 10598 USA
来源
2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021) | 2021年
关键词
Online Learning; Contextual Bandit;
D O I
10.1109/ICASSP39728.2021.9414300
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
We consider a novel variant of the contextual bandit problem (i.e., the multi-armed bandit with side-information, or context, available to a decision-maker) where the context used at each decision may be corrupted ("useless context"). This new problem is motivated by certain on-line settings including clinical trial and ad recommendation applications. In order to address the corrupted-context setting, we propose to combine the standard contextual bandit approach with a classical multi-armed bandit mechanism. Unlike standard contextual bandit methods, we are able to learn from all iteration, even those with corrupted context, by improving the computing of the expectation for each arm. Promising empirical results are obtained on several real-life datasets.
引用
收藏
页码:3145 / 3149
页数:5
相关论文
共 23 条
[1]  
Abbasi-Yadkori Y., 2012, P MACHINE LEARNING R, P1
[2]  
Agrawal Shipra, 2013, INT C MACH LEARN, DOI DOI 10.5555/3042817.3043073
[3]   Finite-time analysis of the multiarmed bandit problem [J].
Auer, P ;
Cesa-Bianchi, N ;
Fischer, P .
MACHINE LEARNING, 2002, 47 (2-3) :235-256
[4]   Using multi-armed bandits to learn ethical priorities for online AI systems [J].
Balakrishnan, A. ;
Bouneffouf, D. ;
Mattei, N. ;
Rossi, F. .
IBM JOURNAL OF RESEARCH AND DEVELOPMENT, 2019, 63 (4-5)
[5]  
Balakrishnan A, 2019, AAAI CONF ARTIF INTE, P3
[6]  
Bouneffouf D., 2019, ARXIV190410040
[7]  
Bouneffouf D, 2017, PROCEEDINGS OF THE TWENTY-SIXTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, P1468
[8]   Bandit Models of Human Behavior: Reward Processing in Mental Disorders [J].
Bouneffouf, Djallel ;
Rish, Irina ;
Cecchi, Guillermo A. .
ARTIFICIAL GENERAL INTELLIGENCE: 10TH INTERNATIONAL CONFERENCE, AGI 2017, 2017, 10414 :237-248
[9]   Multi-armed bandit problem with known trend [J].
Bouneffouf, Djallel ;
Feraud, Raphael .
NEUROCOMPUTING, 2016, 205 :16-21
[10]  
Bouneffouf Djallel, 2020, ARXIV201009473