Self-Supervised Reinforcement Learning for Recommender Systems

被引：134

作者：

Xin, Xin ^{[1
,3
]}

Karatzoglou, Alexandros ^{[2
]}

Arapakis, Ioannis ^{[3
]}

Jose, Joemon M. ^{[1
]}

机构：

[1] Univ Glasgow, Glasgow, Lanark, Scotland

[2] Google, London, England

[3] Tele Res, Barcelona, Spain

来源：

PROCEEDINGS OF THE 43RD INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '20) | 2020年

关键词：

Session-based Recommendation; Sequential Recommendation; Reinforcement Learning; Self-supervised Learning; Q-learning;

D O I：

10.1145/3397271.3401147

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

In session-based or sequential recommendation, it is important to consider a number of factors like long-term user engagement, multiple types of user-item interactions such as clicks, purchases etc. The current state-of-the-art supervised approaches fail to model them appropriately. Casting sequential recommendation task as a reinforcement learning (RL) problem is a promising direction. A major component of RL approaches is to train the agent through interactions with the environment. However, it is often problematic to train a recommender in an on-line fashion due to the requirement to expose users to irrelevant recommendations. As a result, learning the policy from logged implicit feedback is of vital importance, which is challenging due to the pure off-policy setting and lack of negative rewards (feedback). In this paper, we propose self-supervised reinforcement learning for sequential recommendation tasks. Our approach augments standard recommendation models with two output layers: one for self-supervised learning and the other for RL. The RL part acts as a regularizer to drive the supervised layer focusing on specific rewards (e.g., recommending items which may lead to purchases rather than clicks) while the self-supervised layer with cross-entropy loss provides strong gradient signals for parameter updates. Based on such an approach, we propose two frameworks namely Self-Supervised Q-learning (SQN) and Self-Supervised Actor-Critic (SAC). We integrate the proposed frameworks with four state-of-the-art recommendation models. Experimental results on two real-world datasets demonstrate the effectiveness of our approach.

引用

页码：931 / 940

页数：10

共 45 条

[1]

[Anonymous], 2015, ACS SYM SER

[2] DYNAMIC PROGRAMMING [J].

BELLMAN, R .

SCIENCE, 1966, 153 (3731) :34-&

[3]

Bradley K., 2001, P 12 IR C ART INT CO, P141

[4] Top-K Off-Policy Correction for a REINFORCE Recommender System [J].

Chen, Minmin ;

Beutel, Alex ;

Covington, Paul ;

Jain, Sagar ;

Belletti, Francois ;

Chi, Ed H. .

PROCEEDINGS OF THE TWELFTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM'19), 2019, :456-464

[5]

Chen XS, 2019, 36 INT C MACHINE LEA, V97

[6] Coupled Term-Term Relation Analysis for Document Clustering [J].

Cheng, Xin ;

Miao, Duoqian ;

Wang, Can ;

Cao, Longbing .

2013 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2013,

[7]

Cho K., 2014, P SSST 8 8 WORKSH SY, DOI DOI 10.3115/V1/W14-4012

[8]

Fujimoto S, 2018, PR MACH LEARN RES, V80

[9] Exact-K Recommendation via Maximal Clique Optimization [J].

Gong, Yu ;

Zhu, Yu ;

Duan, Lu ;

Liu, Qingwen ;

Guan, Ziyu ;

Sun, Fei ;

Ou, Wenwu ;

Zhu, Kenny Q. .

KDD'19: PROCEEDINGS OF THE 25TH ACM SIGKDD INTERNATIONAL CONFERENCCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2019, :617-626

[10]

Goodfellow IJ, 2014, ADV NEUR IN, V27, P2672

← 1 2 3 4 5 →