Graph reinforcement learning with relational priors for predictive power allocation

被引:1
作者
Zhao, Jianyu [1 ]
Yang, Chenyang [1 ]
机构
[1] Beihang Univ, Sch Elect & Informat Engn, Beijing 100191, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
reinforcement learning; graph neural network; relational priors; resource allocation; RESOURCE-ALLOCATION; VIDEO DELIVERY; NETWORK; MANAGEMENT; SYSTEMS;
D O I
10.1007/s11432-024-4261-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Deep reinforcement learning for resource allocation has been investigated extensively owing to its ability of handling model-free and end-to-end problems. However, its slow convergence and high time complexity during online training hinder its practical use in dynamic wireless systems. To reduce the training complexity, we resort to graph reinforcement learning for leveraging two kinds of relational priors inherent in many wireless communication problems: topology information and permutation properties. To harness the two priors, we first conceive a method to convert the state matrix into a state graph, and then propose a graph deep deterministic policy gradient (DDPG) algorithm with the desired permutation property. To demonstrate how to apply the proposed methods, we consider a representative problem of using reinforcement learning, predictive power allocation, which minimizes the energy consumption while ensuring the quality-of-service of each user requesting video streaming. We derive the time complexity required by training the proposed graph DDPG algorithm and fully-connected neural network-based DDPG algorithm in each time step. Simulations show that the graph DDPG algorithm converges much faster and needs much lower time and space complexity than existing DDPG algorithms to achieve the same learning performance.
引用
收藏
页数:18
相关论文
共 57 条
[1]   The Frontiers of Deep Reinforcement Learning for Resource Management in Future Wireless HetNets: Techniques, Challenges, and Research Directions [J].
Alwarafy, Abdulmalik ;
Abdallah, Mohamed ;
Ciftler, Bekir Sait ;
Al-Fuqaha, Ala ;
Hamdi, Mounir .
IEEE OPEN JOURNAL OF THE COMMUNICATIONS SOCIETY, 2022, 3 :322-365
[2]  
Aschenbruck N., 2010, P INT ICST C SIM TOO, P1
[3]   Utilization of Stochastic Modeling for Green Predictive Video Delivery Under Network Uncertainties [J].
Atawia, Ramy ;
Hassanein, Hossam S. ;
Abu Ali, Najah ;
Noureldin, Aboelmagd .
IEEE TRANSACTIONS ON GREEN COMMUNICATIONS AND NETWORKING, 2018, 2 (02) :556-569
[4]  
Ba J., 2014, Adam: A method for stochastic optimization
[5]   A survey of computational complexity results in systems and control [J].
Blondel, VD ;
Tsitsiklis, JN .
AUTOMATICA, 2000, 36 (09) :1249-1274
[6]  
Bondi A. B., 2000, Proceedings Second International Workshop on Software and Performance. WOSP2000, P195, DOI 10.1145/350391.350432
[7]   Data-Driven Evaluation of Anticipatory Networking in LTE Networks [J].
Bui, Nicola ;
Widmer, Joerg .
IEEE TRANSACTIONS ON MOBILE COMPUTING, 2018, 17 (10) :2252-2265
[8]   A GNN-Based Supervised Learning Framework for Resource Allocation in Wireless IoT Networks [J].
Chen, Tianrui ;
Zhang, Xinruo ;
You, Minglei ;
Zheng, Gan ;
Lambotharan, Sangarapillai .
IEEE INTERNET OF THINGS JOURNAL, 2022, 9 (03) :1712-1724
[9]  
Dalal G, 2018, Arxiv, DOI arXiv:1801.08757
[10]   Constant bit rate network transmission of variable bit rate continuous media in Video-On-Demand servers [J].
DelRosario, JM ;
Fox, G .
MULTIMEDIA TOOLS AND APPLICATIONS, 1996, 2 (03) :215-232