Deep Exploration for Recommendation Systems

被引:1
作者
Zhu, Zheqing [1 ]
Van Roy, Benjamin [1 ,2 ]
机构
[1] Stanford Univ, Meta AI, Stanford, CA 94305 USA
[2] Stanford Univ, Stanford, CA USA
来源
PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023 | 2023年
关键词
Reinforcement Learning; Recommendation Systems; Decision Making under Uncertainty;
D O I
10.1145/3604915.3608855
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Modern recommendation systems ought to benefit by probing for and learning from delayed feedback. Research has tended to focus on learning from a user's response to a single recommendation. Suchwork, which leverages methods of supervised and bandit learning, forgoes learning from the user's subsequent behavior. Where past work has aimed to learn from subsequent behavior, there has been a lack of effective methods for probing to elicit informative delayed feedback. Effective exploration through probing for delayed feedback becomes particularly challenging when rewards are sparse. To address this, we develop deep exploration methods for recommendation systems. In particular, we formulate recommendation as a sequential decision problem and demonstrate benefits of deep exploration over single-step exploration. Our experiments are carried out with high-fidelity industrial-grade simulators and establish large improvements over existing algorithms.
引用
收藏
页码:963 / 970
页数:8
相关论文
共 49 条
  • [1] [Anonymous], 2015, Asian Conference on Machine Learning
  • [2] [Anonymous], 2011, P 2 INT WORKSHOP INF, P57, DOI DOI 10.1145/2039320.2039329
  • [3] Finite-time analysis of the multiarmed bandit problem
    Auer, P
    Cesa-Bianchi, N
    Fischer, P
    [J]. MACHINE LEARNING, 2002, 47 (2-3) : 235 - 256
  • [4] Bartlett PL, 2017, 31 ANN C NEURAL INFO, V30
  • [5] Blanda Stephanie, 2016, Online Recommender Systems-How Does a Website Know What I Want?, V31
  • [6] Ensemble Recommendations via Thompson Sampling: an Experimental Study within e-Commerce
    Broden, Bjorn
    Hammar, Mikael
    Nilsson, Bengt J.
    Paraschakis, Dimitris
    [J]. IUI 2018: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES, 2018, : 19 - 29
  • [7] Chapelle O, 2011, Advances in Neural Information Processing Systems, V24
  • [8] Off-Policy Actor-critic for Recommender Systems
    Chen, Minmin
    Xu, Can
    Gatto, Vince
    Jain, Devanshu
    Kumar, Aviral
    Chi, Ed
    [J]. PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, : 338 - 349
  • [9] Top-K Off-Policy Correction for a REINFORCE Recommender System
    Chen, Minmin
    Beutel, Alex
    Covington, Paul
    Jain, Sagar
    Belletti, Francois
    Chi, Ed H.
    [J]. PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 456 - 464
  • [10] Chen XS, 2019, 36 INT C MACHINE LEA, V97