Deep Exploration for Recommendation Systems

被引：1

作者：

Zhu, Zheqing ^{[1
]}

Van Roy, Benjamin ^{[1
,2
]}

机构：

[1] Stanford Univ, Meta AI, Stanford, CA 94305 USA

[2] Stanford Univ, Stanford, CA USA

来源：

PROCEEDINGS OF THE 17TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2023 | 2023年

关键词：

Reinforcement Learning; Recommendation Systems; Decision Making under Uncertainty;

D O I：

10.1145/3604915.3608855

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Modern recommendation systems ought to benefit by probing for and learning from delayed feedback. Research has tended to focus on learning from a user's response to a single recommendation. Suchwork, which leverages methods of supervised and bandit learning, forgoes learning from the user's subsequent behavior. Where past work has aimed to learn from subsequent behavior, there has been a lack of effective methods for probing to elicit informative delayed feedback. Effective exploration through probing for delayed feedback becomes particularly challenging when rewards are sparse. To address this, we develop deep exploration methods for recommendation systems. In particular, we formulate recommendation as a sequential decision problem and demonstrate benefits of deep exploration over single-step exploration. Our experiments are carried out with high-fidelity industrial-grade simulators and establish large improvements over existing algorithms.

引用

页码：963 / 970

页数：8

共 49 条

[1] [Anonymous], 2015, Asian Conference on Machine Learning
[2] [Anonymous], 2011, P 2 INT WORKSHOP INF, P57, DOI DOI 10.1145/2039320.2039329
[3] Finite-time analysis of the multiarmed bandit problem
Auer, P
Cesa-Bianchi, N
Fischer, P
[J]. MACHINE LEARNING, 2002, 47 (2-3) : 235 - 256
[4] Bartlett PL, 2017, 31 ANN C NEURAL INFO, V30
[5] Blanda Stephanie, 2016, Online Recommender Systems-How Does a Website Know What I Want?, V31
[6] Ensemble Recommendations via Thompson Sampling: an Experimental Study within e-Commerce
Broden, Bjorn
Hammar, Mikael
Nilsson, Bengt J.
Paraschakis, Dimitris
[J]. IUI 2018: PROCEEDINGS OF THE 23RD INTERNATIONAL CONFERENCE ON INTELLIGENT USER INTERFACES, 2018, : 19 - 29
[7] Chapelle O, 2011, Advances in Neural Information Processing Systems, V24
[8] Off-Policy Actor-critic for Recommender Systems
Chen, Minmin
Xu, Can
Gatto, Vince
Jain, Devanshu
Kumar, Aviral
Chi, Ed
[J]. PROCEEDINGS OF THE 16TH ACM CONFERENCE ON RECOMMENDER SYSTEMS, RECSYS 2022, 2022, : 338 - 349
[9] Top-K Off-Policy Correction for a REINFORCE Recommender System
Chen, Minmin
Beutel, Alex
Covington, Paul
Jain, Sagar
Belletti, Francois
Chi, Ed H.
[J]. PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, : 456 - 464
[10] Chen XS, 2019, 36 INT C MACHINE LEA, V97

← 1 2 3 4 5 →