DEN-DQL: Quick Convergent Deep Q-Learning with Double Exploration Networks for News Recommendation

被引:0
作者
Song, Zhanghan [1 ]
Zhang, Dian [1 ]
Shi, Xiaochuan [1 ]
Li, Wei [2 ]
Ma, Chao [1 ]
Wu, Libing [1 ]
机构
[1] Wuhan Univ, Sch Cyber Sci & Engn, Wuhan, Peoples R China
[2] Jiangnan Univ, Sch Artificial Intelligence & Comp Sci, Wuxi, Jiangsu, Peoples R China
来源
2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2021年
关键词
Reinforcement Learning; Deep Q-Learning; News Recommendation; Double Exploration Networks;
D O I
10.1109/IJCNN52387.2021.9533818
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Due to the dynamic characteristics of news and user preferences, personalized recommendation is a challenging problem. Traditional recommendation methods simply focus on current reward, which just recommend items to maximize the number of current clicks. And this may reduce users' interest in similar items. Although the news recommendation framework based on deep reinforcement learning preciously proposed (i.e, DRL, based on deep Q-learning) has the advantages of focusing on future total rewards and dynamic interactive recommendation, it has two issues. First, its exploration method is slow to converge, which may bring new users a bad experience. Second, it is hard to train on off-line data set because the reward is difficult to be determined. In order to address the aforementioned issues, we propose a framework named DEN-DQL for news recommendation based on deep Q-learning with double exploration networks. Also, we develop a new method to calculate rewards and use an off-line data set to simulate the online news clicking environment to train DEN-DQL. Then, the well trained DEN-DQL is tested in the online environment of the same data set, which demonstrates at least 10% improvement of the proposed DEN-DQL.
引用
收藏
页数:8
相关论文
共 32 条
[1]  
ADOMAVICIUS G, 2001, DATA MINING KNOWLEDG
[2]  
[Anonymous], 2011, Advances in neural information processing systems
[3]  
Bouneffouf D, 2012, LECT NOTES COMPUT SC, V7665, P324, DOI 10.1007/978-3-642-34487-9_40
[4]  
Cheng H.-T., 2016, P 1 WORKSH DEEP LEAR, P7
[5]  
Das A. S., 2007, P 16 INT C WORLD WID, P271
[6]   Online Learning to Rank for Information Retrieval SIGIR 2016 Tutorial [J].
Grotov, Artem ;
de Rijke, Maarten .
SIGIR'16: PROCEEDINGS OF THE 39TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2016, :1215-1218
[7]  
Hofmann Katja, 2013, P 6 ACM INT C WEB SE, P183
[8]   Software-Defined Infrastructure for Decentralized Data Lifecycle Governance: Principled Design and Open Challenges [J].
Huang, Gang ;
Luo, Chaoran ;
Wu, Kaidong ;
Ma, Yun ;
Zhang, Ying ;
Liu, Xuanzhe .
2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, :1674-1683
[9]  
IJntema Wouter., 2010, Proceedings of the 2010 EDBT/ICDT Workshops, New York, NY, P1, DOI DOI 10.1145/1754239.1754257
[10]  
KOMPAN M, 2010, E COMMERCE WEB TECHN, V61, P61