Downlink Scheduler for Delay Guaranteed Services Using Deep Reinforcement Learning

被引:3
作者
Ji, Jiequ [1 ]
Ren, Xiangyu [2 ]
Cai, Lin [2 ]
Zhu, Kun [1 ]
机构
[1] Nanjing Univ Aeronaut & Astronaut, Coll Comp Sci & Technol, Nanjing 210016, Peoples R China
[2] Univ Victoria, Dept Elect & Comp Engn, Victoria, BC V8W 3P6, Canada
基金
中国国家自然科学基金;
关键词
Delays; Scheduling; Downlink; Throughput; Optimal scheduling; Wireless networks; Simulation; Resource allocation; packet selection; delay and network utility optimality; deep reinforcement learning; RESOURCE-ALLOCATION; THROUGHPUT; STABILITY; ALGORITHM; SYSTEMS; DECOMPOSITION; FAIRNESS; POLICIES;
D O I
10.1109/TMC.2023.3276697
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In this article, we propose a novel scheduling scheme to guarantee per-packet delay in single-hop wireless networks for delay-critical applications. We consider several classes of packets with different delay requirements, where high-class packets yield high utility after successful transmission. Considering the correla-tionship of delays among competing packets, we apply a delay-laxity concept and introduce a new output gain function for scheduling decisions. Particularly, the selection of a packet takes into account not only its output gain but also the delay-laxity of other packets. In this context, we formulate a multi-objective optimization problem aiming to minimize the average queue length while maximizing the average output gain under the constraint of guaranteeing per-packet delay. However, due to the uncertainty in the environment (e.g., time-varying channel conditions and random packet arrivals), it is difficult and often impractical to solve this problem using traditional optimization techniques. We develop a deep reinforcement learning (DRL)-based framework to solve it. Specifically, we decompose the original optimization problem into a set of scalar optimization subproblems and model each of them as a partially observable Markov Decision Process (POMDP). We then resort to a Double Deep Q Network (DDQN)-based algorithm to learn an optimal scheduling policy for each subproblem, which can overcome the large-scale state space and reduce Q-value overestimation. Simulation results show that our proposed DDQN-based algorithm outperforms the conventional Q-learning algorithm in terms of reward and learning speed. In addition, our proposed scheduling scheme can achieve significant reductions in average delay and delay outage drop rate compared to other benchmark schemes.
引用
收藏
页码:3376 / 3390
页数:15
相关论文
共 45 条
  • [1] ALTMAN E, 1999, STOCH MODEL SER, P1
  • [2] Scheduling in a queuing system with asynchronously varying service rates
    Andrews, M
    Kumaran, K
    Ramanan, K
    Stolyar, A
    Vijayakumar, R
    Whiting, P
    [J]. PROBABILITY IN THE ENGINEERING AND INFORMATIONAL SCIENCES, 2004, 18 (02) : 191 - 217
  • [3] Deep Reinforcement Learning A brief survey
    Arulkumaran, Kai
    Deisenroth, Marc Peter
    Brundage, Miles
    Bharath, Anil Anthony
    [J]. IEEE SIGNAL PROCESSING MAGAZINE, 2017, 34 (06) : 26 - 38
  • [4] Learning to Schedule Network Resources Throughput and Delay Optimally Using Q+-Learning
    Bae, Jeongmin
    Lee, Joohyun
    Chong, Song
    [J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2021, 29 (02) : 750 - 763
  • [5] Self-Evolving and Transformative Protocol Architecture for 6G
    Cai, Lin
    Pan, Jianping
    Yang, Wenjun
    Ren, Xiangyu
    Shen, Xuemin
    [J]. IEEE WIRELESS COMMUNICATIONS, 2023, 30 (04) : 178 - 186
  • [6] Dynamic Server Allocation Over Time-Varying Channels With Switchover Delay
    Celik, Guener D.
    Le, Long B.
    Modiano, Eytan
    [J]. IEEE TRANSACTIONS ON INFORMATION THEORY, 2012, 58 (09) : 5856 - 5877
  • [7] Learning-Based Proactive Resource Allocation for Delay-Sensitive Packet Transmission
    Chen, Jiayin
    Yang, Peng
    Ye, Qiang
    Zhuang, Weihua
    Shen, Xuemin
    Li, Xu
    [J]. IEEE TRANSACTIONS ON COGNITIVE COMMUNICATIONS AND NETWORKING, 2021, 7 (02) : 675 - 688
  • [8] On Achieving Fair and Throughput-Optimal Scheduling for TCP Flows in Wireless Networks
    Chen, Yi
    Wang, Xuan
    Cai, Lin
    [J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2016, 15 (12) : 7996 - 8008
  • [9] Chilukuri S, 2021, 2021 IFIP NETWORKING CONFERENCE AND WORKSHOPS (IFIP NETWORKING), DOI [10.23919/IFIPNETWORKING52078.2021.9472801, 10.23919/IFIPNetworking52078.2021.9472801]
  • [10] Towards 5G: A Reinforcement Learning-Based Scheduling Solution for Data Traffic Management
    Comsa, Ioan-Sorin
    Zhang, Sijing
    Aydin, Mehmet Emin
    Kuonen, Pierre
    Lu, Yao
    Trestian, Ramona
    Ghinea, Gheorghita
    [J]. IEEE TRANSACTIONS ON NETWORK AND SERVICE MANAGEMENT, 2018, 15 (04): : 1661 - 1675