A Deep Reinforcement Learning Approach to Proactive Content Pushing and Recommendation for Mobile Users

被引:27
作者
Liu, Dong [1 ]
Yang, Chenyang [1 ]
机构
[1] Beihang Univ BUAA, Sch Elect & Informat Engn, Beijing 100191, Peoples R China
基金
中国国家自然科学基金;
关键词
Wireless edge caching; content recommendation; pushing; deep reinforcement learning; CACHING POLICY; NETWORKS; PLACEMENT; DELIVERY;
D O I
10.1109/ACCESS.2019.2925019
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The gain from proactive caching at mobile devices highly relies on the accurate prediction of user demands and mobility, which, however, is hard to achieve due to the random user behavior. In this paper, we leverage personalized content recommendation to reduce the uncertainty of user demands in sending requests. We formulate a joint content pushing and recommendation problem that maximizes the net profit of a mobile network operator. To cope with the challenges in modeling and learning user behavior, we establish a reinforcement learning (RL) framework to resolve the problem. To circumvent the curse of dimensionality of reinforcement learning for the joint problem, that is, with very large action and state spaces, we decompose the original problem into two RL problems, where two agents with different goals operate together, and we limit the number of possible actions in each state of the pushing agent by harnessing the well-learned recommendation policy. To enable the generalization of action values from experienced states to the unexperienced states with function approximation, we find a proper way to represent the state and action of the pushing agent. Then, we resort to double deep-Q network with dueling architecture to solve the two problems. The simulation results show that the learned recommendation and pushing policies are able to converge and can increase the net profit significantly compared with baseline policies.
引用
收藏
页码:83120 / 83136
页数:17
相关论文
共 42 条
[1]  
AI O., 2014, OPENAI BASELINES ACK
[2]  
[Anonymous], 2016, PROC INT C MACH LEAR
[3]  
[Anonymous], 2017, P IEEE WIR COMM NETW
[4]  
[Anonymous], 2018, TENSORFLOW LARGE SCA
[5]  
[Anonymous], 2014, PROXIMAL POLICY OPTI
[6]  
[Anonymous], 2017, P IEEE GLOB COMM C G
[7]  
Calabrese F., 161110253 ARXIV
[8]  
Chatzieleftheriou L. E, 2017, P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM P IEEE C COMP COMM, P784
[9]   Caching Policy for Cache-Enabled D2D Communications by Learning User Preference [J].
Chen, Binqiang ;
Yang, Chenyang .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2018, 66 (12) :6586-6601
[10]   Content Pushing With Request Delay Information [J].
Chen, Wei ;
Poor, H. Vincent .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2017, 65 (03) :1146-1161