Deep Reinforcement Learning for Resource Constrained Multiclass Scheduling in Wireless Networks

被引：1

作者：

Avranas, Apostolos ^{[1
]}

Ciblat, Philippe ^{[2
]}

Kountouris, Marios ^{[3
]}

机构：

[1] Amadeus SAS, F-06902 Sophia Antipolis, France

[2] Inst Polytech Paris, Telecom Paris, Lab Traitement & Commun Informat LTCI, F-91120 Palaiseau, France

[3] EURECOM, Commun Syst Dept, F-06904 Sophia Antipolis, France

来源：

IEEE TRANSACTIONS ON MACHINE LEARNING IN COMMUNICATIONS AND NETWORKING | 2023年 / 1卷

关键词：

Resource management; Quality of service; Optimization; Fading channels; Dynamic scheduling; Heuristic algorithms; Reinforcement learning; Deep reinforcement learning; deep sets; QoS traffic scheduling; multiclass services; dynamic resource allocation; POWER ALLOCATION; NEURAL-NETWORKS; DESIGN;

D O I：

10.1109/TMLCN.2023.3314705

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

The problem of multiclass scheduling in a dynamic wireless setting is considered here, where the available limited bandwidth resources are allocated to handle random service demand arrivals, belonging to different classes in terms of payload data request, delay tolerance, and importance/priority. In addition to heterogeneous traffic, another major challenge stems from random service rates due to time-varying wireless communication channels. Existing scheduling and resource allocation approaches, ranging from simple greedy heuristics and constrained optimization to combinatorics, are tailored to specific network or application configuration and are usually suboptimal. On this account, we resort to deep reinforcement learning (DRL) and propose a distributional Deep Deterministic Policy Gradient (DDPG) algorithm combined with Deep Sets to tackle the aforementioned problem. Furthermore, we present a novel way to use a Dueling Network, which leads to further performance improvement. Our proposed algorithm is tested on both synthetic and real data, showing consistent gains against baseline methods from combinatorics and optimization, and state-of-the-art scheduling metrics. Our method can, for instance, achieve with 13% less power and bandwidth resources the same user satisfaction rate as a myopic algorithm using knapsack optimization.

引用

页码：225 / 241

页数：17

共 44 条

[1]

[Anonymous], 2018, document TR 36.913 V15.0.0

[2] OPTIMAL CONTROL OF MARKOV PROCESSES WITH INCOMPLETE STATE INFORMATION [J].

ASTROM, KJ .

JOURNAL OF MATHEMATICAL ANALYSIS AND APPLICATIONS, 1965, 10 (01) :174-&

[3] Deep Reinforcement Learning for Wireless Resource Allocation Using Buffer State Information [J].

Bansbach, Eike-Manuel ;

Eliachevitch, Victor ;

Schmalen, Laurent .

2021 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM), 2021,

[4]

Barth-Maron G., 2018, ICLR

[5]

Bellemare MG, 2017, PR MACH LEARN RES, V70

[6] A MARKOVIAN DECISION PROCESS [J].

BELLMAN, R .

JOURNAL OF MATHEMATICS AND MECHANICS, 1957, 6 (05) :679-684

[7]

Cheng MX, 2018, ASIA S PACIF DES AUT, P129, DOI 10.1109/ASPDAC.2018.8297294

[8]

Chinchali S, 2018, AAAI CONF ARTIF INTE, P766

[9]

Dabney W, 2018, Arxiv, DOI arXiv:1806.06923

[10]

Dabney W, 2018, AAAI CONF ARTIF INTE, P2892

← 1 2 3 4 5 →