Supervised Advantage Actor-Critic for Recommender Systems

被引：14

作者：

Xin, Xin ^{[1
]}

Karatzoglou, Alexandros ^{[2
]}

Arapakis, Ioannis ^{[3
]}

Jose, Joemon M. ^{[4
]}

机构：

[1] Shandong Univ, Jinan, Peoples R China

[2] Google Res, London, England

[3] Tel Res, Barcelona, Spain

[4] Univ Glasgow, Glasgow, Lanark, Scotland

来源：

WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING | 2022年

基金：

国家重点研发计划;

关键词：

Recommendation; Reinforcement Learning; Actor-Critic; Q-learning; Advantage Actor-Critic; Negative Sampling;

D O I：

10.1145/3488560.3498494

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approaches for RS attempt to tackle these challenges by combining RL and (self-)supervised sequential learning, but still suffer from certain limitations. For example, the estimation of Q-values tends to be biased toward positive values due to the lack of negative reward signals. Moreover, the Q-values also depend heavily on the specific timestamp of a sequence. To address the above problems, we propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. We call this method Supervised Negative Q-learning (SNQN). Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case, which can be further utilized as a normalized weight for learning the supervised sequential part. This leads to another learning framework: Supervised Advantage Actor-Critic (SA2C). We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets. Experimental results show that the proposed approaches achieve significantly better performance than state-of-the-art supervised methods and existing self-supervised RL methods.

引用

页码：1186 / 1196

页数：11

共 50 条

[11] Natural Actor-Critic
Peters, Jan
Schaal, Stefan
[J]. NEUROCOMPUTING, 2008, 71 (7-9) : 1180 - 1190
[12] On actor-critic algorithms
Konda, VR
Tsitsiklis, JN
[J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) : 1143 - 1166
[13] A New Advantage Actor-Critic Algorithm For Multi-Agent Environments
Paczolay, Gabor
Harmati, Istvan
[J]. 2020 23RD IEEE INTERNATIONAL SYMPOSIUM ON MEASUREMENT AND CONTROL IN ROBOTICS (ISMCR), 2020,
[14] An Advantage Actor-Critic Algorithm with Confidence Exploration for Open Information Extraction
Liu, Guiliang
Li, Xu
Sun, Miningming
Li, Ping
[J]. PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 217 - 225
[15] Robust Actor-Critic With Relative Entropy Regulating Actor
Cheng, Yuhu
Huang, Longyang
Chen, C. L. Philip
Wang, Xuesong
[J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) : 9054 - 9063
[16] Adversarial retraining attack of asynchronous advantage actor-critic based pathfinding
Chen Tong
Liu Jiqiang
Xiang Yingxiao
Niu Wenjia
Tong Endong
Wang Shuoru
Li He
Chang Liang
Li Gang
Alfred, Chen Qi
[J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (05) : 2323 - 2346
[17] An Actor-Critic approach for control of Residential Photovoltaic-Battery Systems
Joshi, Amit
Tipaldi, Massimo
Glielmo, Luigi
[J]. IFAC PAPERSONLINE, 2021, 54 (07): : 222 - 227
[18] Supervised actor-critic reinforcement learning with action feedback for algorithmic trading
Sun, Qizhou
Si, Yain-Whar
[J]. APPLIED INTELLIGENCE, 2023, 53 (13) : 16875 - 16892
[19] Supervised actor-critic reinforcement learning with action feedback for algorithmic trading
Qizhou Sun
Yain-Whar Si
[J]. Applied Intelligence, 2023, 53 : 16875 - 16892
[20] Multi-actor mechanism for actor-critic reinforcement learning
Li, Lin
Li, Yuze
Wei, Wei
Zhang, Yujia
Liang, Jiye
[J]. INFORMATION SCIENCES, 2023, 647

← 1 2 3 4 5 →