A Stable Deep Reinforcement Learning Framework for Recommendation

被引:2
作者
Liu, Ruochen [1 ]
Jiang, Dawei [1 ]
Zhang, Xilong [1 ]
机构
[1] Xidian Univ, Key Lab Intelligent Percept & Image Understanding, Minist Educ, Xian 710071, Shaanxi, Peoples R China
基金
中国国家自然科学基金;
关键词
Reinforcement learning; Data models; Intelligent systems; Training data; Entropy; Stability analysis; Optimization;
D O I
10.1109/MIS.2022.3145503
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Recommender system (RS) solves the problem of information overload, which is crucial in industrial fields. Recently, reinforcement learning (RL) combined with RS has attracted researchers' attention. These new methods model the interaction between RS and users as a process of serialization decision-making. However, these studies suffer from several disadvantages: 1) they fail to model the accumulated long-term interest tied to high reward, and 2) these algorithms need a lot of interactive data to learn a good strategy and are unstable in the scenario of recommendation. In this article, we propose a stable reinforcement learning framework for recommendation. We redefine the Markov decision process of RL-based recommendation, and add a stable module to model high feedback behavior of users. Second, an advanced RL algorithm is introduced to ensure stability and exploratory. The experiments verify the effectiveness of the proposed algorithm.
引用
收藏
页码:76 / 84
页数:9
相关论文
共 14 条
[1]  
Barkan Oren, 2016, IEEE INT WORKSHOP MA
[2]  
Chen HK, 2019, AAAI CONF ARTIF INTE, P3312
[3]  
Haarnoja T, 2018, PR MACH LEARN RES, V80
[4]  
Hidasi B., 2016, PROC INT C LEARN REP
[5]   Learning from History and Present: Next-item Recommendation via Discriminatively Exploiting User Behaviors [J].
Li, Zhi ;
Zhao, Hongke ;
Liu, Qi ;
Huang, Zhenya ;
Mei, Tao ;
Chen, Enhong .
KDD'18: PROCEEDINGS OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY & DATA MINING, 2018, :1734-1743
[6]  
Lillicrap T.P., 2016, 4 INT C LEARN REPRES
[7]   Amazon.com recommendation - Item-to-item collaborative filtering [J].
Linden, G ;
Smith, B ;
York, J .
IEEE INTERNET COMPUTING, 2003, 7 (01) :76-80
[8]  
Mnih V, 2013, Arxiv, DOI arXiv:1312.5602
[9]   Human-level control through deep reinforcement learning [J].
Mnih, Volodymyr ;
Kavukcuoglu, Koray ;
Silver, David ;
Rusu, Andrei A. ;
Veness, Joel ;
Bellemare, Marc G. ;
Graves, Alex ;
Riedmiller, Martin ;
Fidjeland, Andreas K. ;
Ostrovski, Georg ;
Petersen, Stig ;
Beattie, Charles ;
Sadik, Amir ;
Antonoglou, Ioannis ;
King, Helen ;
Kumaran, Dharshan ;
Wierstra, Daan ;
Legg, Shane ;
Hassabis, Demis .
NATURE, 2015, 518 (7540) :529-533
[10]  
Shani G, 2005, J MACH LEARN RES, V6, P1265