Self-Supervised Reinforcement Learning for Recommender Systems

被引:134
作者
Xin, Xin [1 ,3 ]
Karatzoglou, Alexandros [2 ]
Arapakis, Ioannis [3 ]
Jose, Joemon M. [1 ]
机构
[1] Univ Glasgow, Glasgow, Lanark, Scotland
[2] Google, London, England
[3] Tele Res, Barcelona, Spain
来源
PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20) | 2020年
关键词
Session-based Recommendation; Sequential Recommendation; Reinforcement Learning; Self-supervised Learning; Q-learning;
D O I
10.1145/3397271.3401147
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In session-based or sequential recommendation, it is important to consider a number of factors like long-term user engagement, multiple types of user-item interactions such as clicks, purchases etc. The current state-of-the-art supervised approaches fail to model them appropriately. Casting sequential recommendation task as a reinforcement learning (RL) problem is a promising direction. A major component of RL approaches is to train the agent through interactions with the environment. However, it is often problematic to train a recommender in an on-line fashion due to the requirement to expose users to irrelevant recommendations. As a result, learning the policy from logged implicit feedback is of vital importance, which is challenging due to the pure off-policy setting and lack of negative rewards (feedback). In this paper, we propose self-supervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. The RL part acts as a regularizer to drive the supervised layer focusing on specific rewards (e.g., recommending items which may lead to purchases rather than clicks) while the self-supervised layer with cross-entropy loss provides strong gradient signals for parameter updates. Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning (SQN) and Self-Supervised Actor-Critic (SAC). We integrate the proposed frameworks with four state-of-the-art recommendation models. Experimental results on two real-world datasets demonstrate the effectiveness of our approach.
引用
收藏
页码:931 / 940
页数:10
相关论文
共 45 条
[1]  
[Anonymous], 2015, ACS SYM SER
[2]   DYNAMIC PROGRAMMING [J].
BELLMAN, R .
SCIENCE, 1966, 153 (3731) :34-&
[3]  
Bradley K., 2001, P 12 IR C ART INT CO, P141
[4]   Top-K Off-Policy Correction for a REINFORCE Recommender System [J].
Chen, Minmin ;
Beutel, Alex ;
Covington, Paul ;
Jain, Sagar ;
Belletti, Francois ;
Chi, Ed H. .
PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, :456-464
[5]  
Chen XS, 2019, 36 INT C MACHINE LEA, V97
[6]   Coupled Term-Term Relation Analysis for Document Clustering [J].
Cheng, Xin ;
Miao, Duoqian ;
Wang, Can ;
Cao, Longbing .
2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,
[7]  
Cho K., 2014, P SSST 8 8 WORKSH SY, DOI DOI 10.3115/V1/W14-4012
[8]  
Fujimoto S, 2018, PR MACH LEARN RES, V80
[9]   Exact-K Recommendation via Maximal Clique Optimization [J].
Gong, Yu ;
Zhu, Yu ;
Duan, Lu ;
Liu, Qingwen ;
Guan, Ziyu ;
Sun, Fei ;
Ou, Wenwu ;
Zhu, Kenny Q. .
KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :617-626
[10]  
Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672