Constrained Risk-Sensitive Deep Reinforcement Learning for eMBB-URLLC Joint Scheduling

被引：0

作者：

Zhang, Wenheng ^{[1
]}

Derakhshani, Mahsa ^{[1
]}

Zheng, Gan ^{[2
]}

Lambotharan, Sangarapillai ^{[3
]}

机构：

[1] Loughborough Univ, Wolfson Sch, Signal Proc & Networks Res Grp, Loughborough, England

[2] Univ Warwick, Sch Engn, Coventry CV4 7AL, England

[3] Loughborough Univ, Inst Digital Technol, London E20 3BS, England

来源：

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS | 2024年 / 23卷 / 09期

基金：

英国工程与自然科学研究理事会;

关键词：

Ultra reliable low latency communication; Resource management; Reliability; Dynamic scheduling; Wireless communication; Optimization; Uncertainty; eMBB; deep reinforcement learning; punctured scheduling; resource allocation; risk-sensitive; URLLC; 5G; OPTIMIZATION; COEXISTENCE;

D O I：

10.1109/TWC.2024.3373722

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this work, we employ a constrained risk-sensitive deep reinforcement learning (CRS-DRL) approach for joint scheduling in a dynamic multiplexing scenario involving enhanced mobile broadband (eMBB) and ultra-reliable low-latency communications (URLLC). Our scheduling policy minimizes the adverse impact of URLLC puncturing on eMBB users while satisfying URLLC requirements. Conventional DRL-based algorithms for eMBB/URLLC scheduling prioritize maximizing the expected return. However, for URLLC mission-critical applications, it is crucial to explicitly avoid catastrophic scheduling failures associated with the long tail of the reward distribution. Therefore, robust management of such uncertainties and risks is imperative. Our proposed CRS-DRL algorithm incorporates the conditional Value-at-Risk (CVaR) as the risk criterion for optimization. A URLLC queuing mechanism is considered to decrease the URLLC drops and increase eMBB throughput compared to the instant scheduling policy. Our architecture is based on the actor-critic model but considers a transfer function to obtain feasible solutions of the unconstrained actor network, and the critic predicts the entire distribution over future returns instead of simply the expectation. Numerical results indicate that our CRS-DRL algorithm, under varying CVaR levels, achieves similar expected returns but reduces long-tail behavior for long-term rewards compared to the risk-neutral approach.

引用

页码：10608 / 10624

页数：17

共 36 条

[1]

3GPP, 2018, document R 38.913

[2] Intelligent Resource Slicing for eMBB and URLLC Coexistence in 5G and Beyond: A Deep Reinforcement Learning Based Approach [J].

Alsenwi, Madyan ;

Tran, Nguyen H. ;

Bennis, Mehdi ;

Pandey, Shashi Raj ;

Bairagi, Anupam Kumar ;

Hong, Choong Seon .

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2021, 20 (07) :4585-4600

[3] eMBB-URLLC Resource Slicing: A Risk-Sensitive Approach [J].

Alsenwi, Madyan ;

Tran, Nguyen H. ;

Bennis, Mehdi ;

Bairagi, Anupam Kumar ;

Hong, Choong Seon .

IEEE COMMUNICATIONS LETTERS, 2019, 23 (04) :740-743

[4] Joint Scheduling of URLLC and eMBB Traffic in 5G Wireless Networks [J].

Anand, Arjun ;

de Veciana, Gustavo ;

Shakkottai, Sanjay .

IEEE-ACM TRANSACTIONS ON NETWORKING, 2020, 28 (02) :477-490

[5]

[Anonymous], 2019, Standard TR 121 916-V16.0.1

[6]

[Anonymous], 2018, NEW SERVICES APPL 5G

[7] Coexistence Mechanism Between eMBB and uRLLC in 5G Wireless Networks [J].

Bairagi, Anupam Kumar ;

Munir, Md Shirajum ;

Alsenwi, Madyan ;

Tran, Nguyen H. ;

Alshamrani, Sultan S. ;

Masud, Mehedi ;

Han, Zhu ;

Hong, Choong Seon .

IEEE TRANSACTIONS ON COMMUNICATIONS, 2021, 69 (03) :1736-1749

[8]

Bellemare MG, 2017, PR MACH LEARN RES, V70

[9] Coexistence of URLLC and eMBB Services in MIMO-NOMA Systems [J].

Chen, Qimei ;

Wu, Jing ;

Wang, Jiajia ;

Jiang, Hao .

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2023, 72 (01) :839-851

[10]

Gaskett C, 1999, LECT NOTES ARTIF INT, V1747, P417

← 1 2 3 4 →