Function Approximation Based Reinforcement Learning for Edge Caching in Massive MIMO Networks

被引：12

作者：

Garg, Navneet ^{[1
]}

Sellathurai, Mathini ^{[2
]}

Bhatia, Vimal ^{[3
]}

Ratnarajah, Tharmalingam ^{[1
]}

机构：

[1] Univ Edinburgh, Sch Engn, Edinburgh EH8 9YL, Midlothian, Scotland

[2] Heriot Watt Univ, Sch Engn & Phys Sci, Edinburgh EH14 4AS, Midlothian, Scotland

[3] Indian Inst Technol Indore, Dept Elect Engn, Indore 452017, India

来源：

IEEE TRANSACTIONS ON COMMUNICATIONS | 2021年 / 69卷 / 04期

基金：

英国工程与自然科学研究理事会;

关键词：

Linear function approximation; massive MIMO; non-linear function approximation; Poisson point process; Q-learning; wireless edge caching; POPULARITY PREDICTION; CONTENT PLACEMENT; OPTIMIZATION; DELIVERY; CLOUD;

D O I：

10.1109/TCOMM.2020.3047658

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Caching popular contents in advance is an important technique to achieve low latency and reduced backhaul congestion in future wireless communication systems. In this article, a multi-cell massive multi-input-multi-output system is considered, where locations of base stations are distributed as a Poisson point process. Assuming probabilistic caching, average success probability (ASP) of the system is derived for a known content popularity (CP) profile, which in practice is time-varying and unknown in advance. Further, modeling CP variations across time as a Markov process, reinforcement Q-learning is employed to learn the optimal content placement strategy to optimize the long-term-discounted ASP and average cache refresh rate. In the Q-learning, the number of Q-updates are large and proportional to the number of states and actions. To reduce the space complexity and update requirements towards scalable Q-learning, two novel (linear and non-linear) function approximations-based Q-learning approaches are proposed, where only a constant (4 and 3 respectively) number of variables need updation, irrespective of the number of states and actions. Convergence of these approximation-based approaches are analyzed. Simulations verify that these approaches converge and successfully learn the similar best content placement, which shows the successful applicability and scalability of the proposed approximated Q-learning schemes.

引用

页码：2304 / 2316

页数：13

共 36 条

[1]

[Anonymous], 2008, P 25 INT C MACHINE L

[2]

Avrachenkov K., PROC ACM MEAS ANAL C, V1, P27

[3] Optimization of caching devices with geometric constraints [J].

Avrachenkov, Konstantin ;

Bai, Xinwei ;

Goseling, Jasper .

PERFORMANCE EVALUATION, 2017, 113 :68-82

[4]

BENVENISTE A, 1990, ADAPTIVE ALGORITHMS

[5] A Learning-Based Approach to Caching in Heterogenous Small Cell Networks [J].

Bharath, B. N. ;

Nagananda, K. G. ;

Poor, H. Vincent .

IEEE TRANSACTIONS ON COMMUNICATIONS, 2016, 64 (04) :1674-1686

[6]

Blaszczyszyn B, 2015, IEEE ICC, P3358, DOI 10.1109/ICC.2015.7248843

[7]

Boyd S. P., 2004, Convex Optimization

[8] Online Content Popularity Prediction and Learning in Wireless Edge Caching [J].

Garg, Navneet ;

Sellathurai, Mathini ;

Bhatia, Vimal ;

Bharath, B. N. ;

Ratnarajah, Tharmalingam .

IEEE TRANSACTIONS ON COMMUNICATIONS, 2020, 68 (02) :1087-1100

[9]

Garg N, 2019, INT CONF ACOUST SPEE, P3092, DOI 10.1109/ICASSP.2019.8682841

[10] Partially Loaded Superimposed Training Scheme for Large MIMO Uplink Systems [J].

Garg, Navneet ;

Jain, Anmol ;

Sharma, Govind .

WIRELESS PERSONAL COMMUNICATIONS, 2018, 100 (04) :1313-1338

← 1 2 3 4 →