Dynamic SDN-Based Radio Access Network Slicing With Deep Reinforcement Learning for URLLC and eMBB Services

被引：61

作者：

Filali, Abderrahime ^{[1
]}

Mlika, Zoubeir ^{[1
]}

Cherkaoui, Soumaya ^{[1
,2
]}

Kobbane, Abdellatif ^{[3
]}

机构：

[1] Univ Sherbrooke, INTERLAB Res Lab, Dept Elect & Comp Sci Engn, Sherbrooke, PQ J1K 2R1, Canada

[2] Polytech Montreal, Dept Comp & Software Engn, Montreal, PQ H3T 1J4, Canada

[3] Univ Mohammed V Rabat, ENSIAS, Rabat 10500, Morocco

来源：

IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING | 2022年 / 9卷 / 04期

关键词：

Resource management; Ultra reliable low latency communication; Quality of service; 5G mobile communication; Optimization; Radio access networks; Heuristic algorithms; Network slicing; software defined networking; URLLC; eMBB; deep reinforcement learning; 5G; INTERNET;

D O I：

10.1109/TNSE.2022.3157274

中图分类号：

T [工业技术];

学科分类号：

08 ;

摘要：

Radio access network (RAN) slicing is a key technology that enables 5G network to support heterogeneous requirements of generic services, namely ultra-reliable low-latency communication (URLLC) and enhanced mobile broadband (eMBB). In this paper, we propose a two time-scales RAN slicing mechanism to optimize the performance of URLLC and eMBB services. In a large time-scale, an SDN controller allocates radio resources to gNodeBs according to the requirements of the eMBB and URLLC services. In a short time-scale, each gNodeB allocates its available resources to its end-users and requests, if needed, additional resources from adjacent gNodeBs. We formulate this problem as a non-linear binary program and prove its NP-hardness. Next, for each time-scale, we model the problem as a Markov decision process (MDP), where the large-time scale is modeled as a single agent MDP whereas the shorter time-scale is modeled as a multi-agent MDP. We leverage the exponential-weight algorithm for exploration and exploitation (EXP3) to solve the single-agent MDP of the large time-scale MDP and the multi-agent deep Q-learning (DQL) algorithm to solve the multi-agent MDP of the short time-scale resource allocation. Extensive simulations show that our approach is efficient under different network parameters configuration and it outperforms recent benchmark solutions.

引用

页码：2174 / 2187

页数：14

共 39 条

[1]

[Anonymous], 2019, TS382115GNR 3GPP

[2]

Auer P, 2003, SIAM J COMPUT, V32, P48, DOI 10.1137/S0097539701398375

[3] Multi-Tenant Cross-Slice Resource Orchestration: A Deep Reinforcement Learning Approach [J].