Budgeted Recommendation with Delayed Feedback

被引:0
作者
Liu, Kweiguu [1 ]
Maghsudi, Setareh [2 ]
Yokoo, Makoto [1 ]
机构
[1] Kyushu Univ, Fac Informat Sci & Elect Engn, Fukuoka 8190395, Japan
[2] Ruhr Univ Bochum, Fac Elect Engn & Informat Technol, D-44801 Bochum, Germany
来源
GOOD PRACTICES AND NEW PERSPECTIVES IN INFORMATION SYSTEMS AND TECHNOLOGIES, VOL 3, WORLDCIST 2024 | 2024年 / 987卷
关键词
Budget Constraints; Delayed Feedback; Online Learning; Resource Allocation;
D O I
10.1007/978-3-031-60221-4_20
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In a conventional contextual multi-armed bandit problem, the feedback (or reward) is immediately observable after an action. Nevertheless, delayed feedback arises in numerous real-life situations and is particularly crucial in time-sensitive applications. The exploration-exploitation dilemma becomes particularly challenging under such conditions, as it couples with the interplay between delays and limited resources. Besides, a limited budget often aggravates the problem by restricting the exploration potential. A motivating example is the distribution of medical supplies at the early stage of COVID-19. The delayed feedback of testing results, thus insufficient information for learning, degraded the efficiency of resource allocation. Motivated by such applications, we study the effect of delayed feedback on constrained contextual bandits. We develop a decision-making policy, delay-oriented resource allocation with learning (DORAL), to optimize the resource expenditure in a contextual multi-armed bandit problem with arm-dependent delayed feedback.
引用
收藏
页码:202 / 213
页数:12
相关论文
共 19 条
  • [1] Optimal Jamming using Delayed Learning
    Amuru, SaiDhiraj
    Buehrer, R. Michael
    [J]. 2014 IEEE MILITARY COMMUNICATIONS CONFERENCE: AFFORDABLE MISSION SUCCESS: MEETING THE CHALLENGE (MILCOM 2014), 2014, : 1528 - 1533
  • [2] Badanidiyuru A., 2014, C LEARN THEOR, P1109
  • [3] Efficient and targeted COVID-19 border testing via reinforcement learning
    Bastani, Hamsa
    Drakopoulos, Kimon
    Gupta, Vishal
    Vlachogiannis, Ioannis
    Hadjicristodoulou, Christos
    Lagiou, Pagona
    Magiorkinis, Gkikas
    Paraskevis, Dimitrios
    Tsiodras, Sotirios
    [J]. NATURE, 2021, 599 (7883) : 108 - +
  • [4] Bubeck S., 2013, ICML, P258
  • [5] Bandits With Heavy Tail
    Bubeck, Sebastien
    Cesa-Bianchi, Nicolo
    Lugosi, Gabor
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2013, 59 (11) : 7711 - 7717
  • [6] Cesa-Bianchi N., 2018, C LEARN THEOR, P750
  • [7] Simple and Scalable Response Prediction for Display Advertising
    Chapelle, Olivier
    Manavoglu, Eren
    Rosales, Romer
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2015, 5 (04)
  • [8] Chen LX, 2019, IEEE INFOCOM SER, P748, DOI [10.1109/INFOCOM.2019.8737654, 10.1109/infocom.2019.8737654]
  • [9] Gael M.A., 2020, PMLR, P3348
  • [10] Ghoorchian S., 2020, IEEE Transactions on Cognitive Communications and Networking