Scalable end-to-end slice embedding and reconfiguration based on independent DQN agents

被引：5

作者：

Doanis, Pavlos ^{[1
]}

Giannakas, Theodoros ^{[2
]}

Spyropoulos, Thrasyvoulos ^{[1
,3
]}

机构：

[1] EURECOM, Biot, France

[2] Huawei Technol, Paris Res Ctr, Boulogne, France

[3] Tech Univ Crete, Khania, Greece

来源：

2022 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM 2022) | 2022年

基金：

欧盟地平线“2020”;

关键词：

VNF placement; Network Slicing; 5G Networks; Reinforcement Learning; Deep-Q Network;

D O I：

10.1109/GLOBECOM48099.2022.10001068

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Network slicing in beyond 5G systems facilitates the creation of customized virtual networks/services, referred to as "slices", on top of the physical network infrastructure. Efficient and dynamic orchestration of slices is needed to ensure the stringent and diverse service level agreements (SLAs) required by different services. In this paper, we provide a model that attempts to capture the problem of dynamic slice embedding and reconfiguration supporting a multi-domain setup and diverse, end-to-end SLAs. We then show that such problems can be optimally solved, in theory, with (tabular) Reinforcement Learning algorithms (e.g., Q-learning) even under, a priori, unknown demand dynamics for each slice. Nevertheless, the state and action complexity of such algorithms is prohibitive, even for very small scenarios. To this end, we propose a novel scheme based on independent DQN agents: The DQN component implements approximate Q-learning, based on simple, generic DNNs for value function approximation, radically reducing state space complexity; the independent agents then tackle the equally important issue of exploding action complexity arising from the combinatorial nature of embedding multiple VNFs per slice, multiple slices, over multiple domains and computing nodes therein. Using realistic data, we show that the proposed algorithm reduces convergence time by orders of magnitude with minimum penalty of decision optimality.

引用

页码：3429 / 3434

页数：6

共 18 条

[1]

Afolabi I., 2018, IEEE COMMS SURVEYS T

[2] A General Framework for Counterfactual Learning-to-Rank [J].

Agarwal, Aman ;

Takatsu, Kenta ;

Zaitsev, Ivan ;

Joachims, Thorsten .

PROCEEDINGS OF THE 42ND INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL (SIGIR '19), 2019, :5-14

[3]

[Anonymous], 2003, MobiCom

[4]

Bega D, 2020, IEEE INFOCOM SER, P794, DOI 10.1109/INFOCOM41043.2020.9155299

[5]

Bega D, 2019, IEEE INFOCOM SER, P280, DOI [10.1109/INFOCOM.2019.8737488, 10.1109/infocom.2019.8737488]

[6]

Bertsekas D.P., 2019, Reinforcement Learning and Optimal Control

[7]

Foukas X., 2017, IEEE COMMS MAGAZINE

[8]

Harchol-Balter Mor, 2013, Performance Modelling and Design of Computer Systems: Queueing Theory in Action

[9]

Leconte M, 2018, IEEE INFOCOM SER, P234, DOI 10.1109/INFOCOM.2018.8486419

[10] Human-level control through deep reinforcement learning [J].

Mnih, Volodymyr ;

Kavukcuoglu, Koray ;

Silver, David ;

Rusu, Andrei A. ;

Veness, Joel ;

Bellemare, Marc G. ;

Graves, Alex ;

Riedmiller, Martin ;

Fidjeland, Andreas K. ;

Ostrovski, Georg ;

Petersen, Stig ;

Beattie, Charles ;

Sadik, Amir ;

Antonoglou, Ioannis ;

King, Helen ;

Kumaran, Dharshan ;

Wierstra, Daan ;

Legg, Shane ;

Hassabis, Demis .

NATURE, 2015, 518 (7540) :529-533

← 1 2 →