Function Approximation Based Reinforcement Learning for Edge Caching in Massive MIMO Networks

被引:12
作者
Garg, Navneet [1 ]
Sellathurai, Mathini [2 ]
Bhatia, Vimal [3 ]
Ratnarajah, Tharmalingam [1 ]
机构
[1] Univ Edinburgh, Sch Engn, Edinburgh EH8 9YL, Midlothian, Scotland
[2] Heriot Watt Univ, Sch Engn & Phys Sci, Edinburgh EH14 4AS, Midlothian, Scotland
[3] Indian Inst Technol Indore, Dept Elect Engn, Indore 452017, India
基金
英国工程与自然科学研究理事会;
关键词
Linear function approximation; massive MIMO; non-linear function approximation; Poisson point process; Q-learning; wireless edge caching; POPULARITY PREDICTION; CONTENT PLACEMENT; OPTIMIZATION; DELIVERY; CLOUD;
D O I
10.1109/TCOMM.2020.3047658
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Caching popular contents in advance is an important technique to achieve low latency and reduced backhaul congestion in future wireless communication systems. In this article, a multi-cell massive multi-input-multi-output system is considered, where locations of base stations are distributed as a Poisson point process. Assuming probabilistic caching, average success probability (ASP) of the system is derived for a known content popularity (CP) profile, which in practice is time-varying and unknown in advance. Further, modeling CP variations across time as a Markov process, reinforcement Q-learning is employed to learn the optimal content placement strategy to optimize the long-term-discounted ASP and average cache refresh rate. In the Q-learning, the number of Q-updates are large and proportional to the number of states and actions. To reduce the space complexity and update requirements towards scalable Q-learning, two novel (linear and non-linear) function approximations-based Q-learning approaches are proposed, where only a constant (4 and 3 respectively) number of variables need updation, irrespective of the number of states and actions. Convergence of these approximation-based approaches are analyzed. Simulations verify that these approaches converge and successfully learn the similar best content placement, which shows the successful applicability and scalability of the proposed approximated Q-learning schemes.
引用
收藏
页码:2304 / 2316
页数:13
相关论文
共 36 条
[1]  
[Anonymous], 2008, P 25 INT C MACHINE L
[2]  
Avrachenkov K., PROC ACM MEAS ANAL C, V1, P27
[3]   Optimization of caching devices with geometric constraints [J].
Avrachenkov, Konstantin ;
Bai, Xinwei ;
Goseling, Jasper .
PERFORMANCE EVALUATION, 2017, 113 :68-82
[4]  
BENVENISTE A, 1990, ADAPTIVE ALGORITHMS
[5]   A Learning-Based Approach to Caching in Heterogenous Small Cell Networks [J].
Bharath, B. N. ;
Nagananda, K. G. ;
Poor, H. Vincent .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2016, 64 (04) :1674-1686
[6]  
Blaszczyszyn B, 2015, IEEE ICC, P3358, DOI 10.1109/ICC.2015.7248843
[7]  
Boyd S. P., 2004, Convex Optimization
[8]   Online Content Popularity Prediction and Learning in Wireless Edge Caching [J].
Garg, Navneet ;
Sellathurai, Mathini ;
Bhatia, Vimal ;
Bharath, B. N. ;
Ratnarajah, Tharmalingam .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2020, 68 (02) :1087-1100
[9]  
Garg N, 2019, INT CONF ACOUST SPEE, P3092, DOI 10.1109/ICASSP.2019.8682841
[10]   Partially Loaded Superimposed Training Scheme for Large MIMO Uplink Systems [J].
Garg, Navneet ;
Jain, Anmol ;
Sharma, Govind .
WIRELESS PERSONAL COMMUNICATIONS, 2018, 100 (04) :1313-1338