Supervised Advantage Actor-Critic for Recommender Systems

被引:14
作者
Xin, Xin [1 ]
Karatzoglou, Alexandros [2 ]
Arapakis, Ioannis [3 ]
Jose, Joemon M. [4 ]
机构
[1] Shandong Univ, Jinan, Peoples R China
[2] Google Res, London, England
[3] Tel Res, Barcelona, Spain
[4] Univ Glasgow, Glasgow, Lanark, Scotland
来源
WSDM'22: PROCEEDINGS OF THE FIFTEENTH ACM INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING | 2022年
基金
国家重点研发计划;
关键词
Recommendation; Reinforcement Learning; Actor-Critic; Q-learning; Advantage Actor-Critic; Negative Sampling;
D O I
10.1145/3488560.3498494
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Casting session-based or sequential recommendation as reinforcement learning (RL) through reward signals is a promising research direction towards recommender systems (RS) that maximize cumulative profits. However, the direct use of RL algorithms in the RS setting is impractical due to challenges like off-policy training, huge action spaces and lack of sufficient reward signals. Recent RL approaches for RS attempt to tackle these challenges by combining RL and (self-)supervised sequential learning, but still suffer from certain limitations. For example, the estimation of Q-values tends to be biased toward positive values due to the lack of negative reward signals. Moreover, the Q-values also depend heavily on the specific timestamp of a sequence. To address the above problems, we propose negative sampling strategy for training the RL component and combine it with supervised sequential learning. We call this method Supervised Negative Q-learning (SNQN). Based on sampled (negative) actions (items), we can calculate the "advantage" of a positive action over the average case, which can be further utilized as a normalized weight for learning the supervised sequential part. This leads to another learning framework: Supervised Advantage Actor-Critic (SA2C). We instantiate SNQN and SA2C with four state-of-the-art sequential recommendation models and conduct experiments on two real-world datasets. Experimental results show that the proposed approaches achieve significantly better performance than state-of-the-art supervised methods and existing self-supervised RL methods.
引用
收藏
页码:1186 / 1196
页数:11
相关论文
共 50 条
  • [11] Natural Actor-Critic
    Peters, Jan
    Schaal, Stefan
    [J]. NEUROCOMPUTING, 2008, 71 (7-9) : 1180 - 1190
  • [12] On actor-critic algorithms
    Konda, VR
    Tsitsiklis, JN
    [J]. SIAM JOURNAL ON CONTROL AND OPTIMIZATION, 2003, 42 (04) : 1143 - 1166
  • [13] A New Advantage Actor-Critic Algorithm For Multi-Agent Environments
    Paczolay, Gabor
    Harmati, Istvan
    [J]. 2020 23RD IEEE INTERNATIONAL SYMPOSIUM ON MEASUREMENT AND CONTROL IN ROBOTICS (ISMCR), 2020,
  • [14] An Advantage Actor-Critic Algorithm with Confidence Exploration for Open Information Extraction
    Liu, Guiliang
    Li, Xu
    Sun, Miningming
    Li, Ping
    [J]. PROCEEDINGS OF THE 2020 SIAM INTERNATIONAL CONFERENCE ON DATA MINING (SDM), 2020, : 217 - 225
  • [15] Robust Actor-Critic With Relative Entropy Regulating Actor
    Cheng, Yuhu
    Huang, Longyang
    Chen, C. L. Philip
    Wang, Xuesong
    [J]. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2023, 34 (11) : 9054 - 9063
  • [16] Adversarial retraining attack of asynchronous advantage actor-critic based pathfinding
    Chen Tong
    Liu Jiqiang
    Xiang Yingxiao
    Niu Wenjia
    Tong Endong
    Wang Shuoru
    Li He
    Chang Liang
    Li Gang
    Alfred, Chen Qi
    [J]. INTERNATIONAL JOURNAL OF INTELLIGENT SYSTEMS, 2021, 36 (05) : 2323 - 2346
  • [17] An Actor-Critic approach for control of Residential Photovoltaic-Battery Systems
    Joshi, Amit
    Tipaldi, Massimo
    Glielmo, Luigi
    [J]. IFAC PAPERSONLINE, 2021, 54 (07): : 222 - 227
  • [18] Supervised actor-critic reinforcement learning with action feedback for algorithmic trading
    Sun, Qizhou
    Si, Yain-Whar
    [J]. APPLIED INTELLIGENCE, 2023, 53 (13) : 16875 - 16892
  • [19] Supervised actor-critic reinforcement learning with action feedback for algorithmic trading
    Qizhou Sun
    Yain-Whar Si
    [J]. Applied Intelligence, 2023, 53 : 16875 - 16892
  • [20] Multi-actor mechanism for actor-critic reinforcement learning
    Li, Lin
    Li, Yuze
    Wei, Wei
    Zhang, Yujia
    Liang, Jiye
    [J]. INFORMATION SCIENCES, 2023, 647