Make Smart Decisions Faster: Deciding D2D Resource Allocation via Stackelberg Game Guided Multi-Agent Deep Reinforcement Learning

被引:32
作者
Shi, Dian [1 ]
Li, Liang [2 ]
Ohtsuki, Tomoaki [3 ]
Pan, Miao [1 ]
Han, Zhu [4 ,5 ]
Poor, H. Vincent [6 ]
机构
[1] Univ Houston, Elect & Comp Engn Dept, Houston, TX 77004 USA
[2] Xidian Univ, Sch Cyber Engn, Xian 710126, Peoples R China
[3] Keio Univ, Dept Informat & Comp Sci, Tokyo 1088345, Japan
[4] Univ Houston, Dept Elect & Comp Engn, Houston, TX 77004 USA
[5] Kyung Hee Univ, Dept Comp Sci & Engn, Seoul 446701, South Korea
[6] Princeton Univ, Dept Elect Engn, Princeton, NJ 08544 USA
关键词
Device-to-device communication; Resource management; Games; Reinforcement learning; Training; Power control; Interference; Deep reinforcement learning; stackelberg game; D2D communications; resource allocation; POWER ALLOCATION; MODE SELECTION; NETWORKS; COMMUNICATION; FEEDBACK;
D O I
10.1109/TMC.2021.3085206
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Device-to-Device (D2D) communication enabling direct data transmission between two mobile users has emerged as a vital component for 5G cellular networks to improve spectrum utilization and enhance system capacity. A critical issue for realizing these benefits in D2D-enabled networks is to properly allocate radio resources while coordinating the co-channel interference in a time-varying communication environment. In this paper, we propose a Stackelberg game (SG) guided multi-agent deep reinforcement learning (MADRL) approach, which allows D2D users to make smart power control and channel allocation decisions in a distributed manner. In particular, we define a crucial Stackelberg Q-value (ST-Q) to guide the learning direction, which can be calculated based on the equilibrium achieved in the Stackelberg game. With the guidance of the Stackelberg equilibrium, our approach converges faster with fewer iterations than the general MADRL method and thereby exhibits better performance in handling the network dynamics. After the initial training, each agent can infer timely D2D resource allocation strategies with distributed execution. Extensive simulations are conducted to validate the efficacy of our proposed scheme in developing timely resource allocation strategies. The results also show that our method outperforms the general MADRL based approach in terms of the average utility, channel capacity, and training time.
引用
收藏
页码:4426 / 4438
页数:13
相关论文
共 47 条
[1]   Power Control and Channel Allocation for D2D Underlaid Cellular Networks [J].
Abdallah, Asmaa ;
Mansour, Mohammad M. ;
Chehab, Ali .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2018, 66 (07) :3217-3234
[2]  
[Anonymous], 2001, P INT JOINT C ART IN
[3]   Dynamic Resource Allocation for Optimized Latency and Reliability in Vehicular Networks [J].
Ashraf, Muhammad Ikram ;
Liu, Chen-Feng ;
Bennis, Mehdi ;
Saad, Walid ;
Hong, Choong Seon .
IEEE ACCESS, 2018, 6 :63843-63858
[4]  
Cheng C, 2017, 2017 6TH DATA DRIVEN CONTROL AND LEARNING SYSTEMS (DDCLS), P727, DOI 10.1109/DDCLS.2017.8068163
[5]  
Han Y., 2018, 2017 IEEE 19th Electron. Packag. Technol. Conf. EPTC 2017, P1
[6]  
Han Z., 2019, Game Theory for Next Generation Wireless and Communication Networks: Modeling, Analysis, and Design
[7]   Nash Q-learning for general-sum stochastic games [J].
Hu, JL ;
Wellman, MP .
JOURNAL OF MACHINE LEARNING RESEARCH, 2004, 4 (06) :1039-1069
[8]   Request Delay-Based Pricing for Proactive Caching: A Stackelberg Game Approach [J].
Huang, Wei ;
Chen, Wei ;
Poor, H. Vincent .
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2019, 18 (06) :2903-2918
[9]   Using Benders Decomposition for Optimal Power Control and Routing in Multihop D2D Cellular Systems [J].
Ibrahim, Ahmed ;
Ngatched, Telex M. N. ;
Dobre, Octavia A. .
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2019, 18 (11) :5050-5064
[10]  
Junling Hu, 1998, Machine Learning. Proceedings of the Fifteenth International Conference (ICML'98), P242