Double Deep Q-Network Based Distributed Resource Matching Algorithm for D2D Communication

被引:25
作者
Yuan, Yazhou [1 ,2 ]
Li, Zhijie [1 ,2 ]
Liu, Zhixin [1 ,2 ]
Yang, Yi [1 ,2 ]
Guan, Xinping [3 ]
机构
[1] Minist Educ Intelligent Control Syst & Intelligen, Engn Res Ctr, Qinhuangdao 066004, Hebei, Peoples R China
[2] Yanshan Univ, Sch Elect Engn, Qinhuangdao 066004, Hebei, Peoples R China
[3] Shanghai Jiao Tong Univ, Dept Automat, Shanghai 200240, Peoples R China
基金
国家重点研发计划; 中国国家自然科学基金;
关键词
Device-to-device communication; Resource management; Cellular networks; Reinforcement learning; Games; Deep learning; Copper; Device-to-device communications; deep reinforcement learning; communication resource; non-cooperative game; MULTIPLE-ACCESS; ALLOCATION;
D O I
10.1109/TVT.2021.3130159
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Device-to-Device (D2D) communication with short communication distance is an efficient way to improve spectrum efficiency and mitigate interference. To realize the optimal resource configuration including wireless channel matching and power allocation, a distributed resource matching scheme is proposed based on deep reinforcement learning(DRL). The reward is defined as the difference of achieve rate of D2D users and the consumed power, which is limited by the Signal to Interference plus Noise Ratio (SINR) of the other cellular users on the current channel. The proposed algorithm maximizes the D2D throughput and energy efficiency in a distributed manner, without online coordination and message exchange between users. The considered resource allocation problem is formulated as a random non-cooperative game with multiple players (D2D pairs), where each player is a learning agent, whose task is to learn its best strategy based on locally observed information, multi-user communication resource matching algorithm is proposed based on a Double Deep Q-network (DDQN), where the total cellular throughput and user energy efficiency could converge to the Nash equilibrium (NE) under the mixed strategy. Simulation results show that the proposed algorithm can improve the communication rate and energy efficiency of each user by selecting the optimal strategy, and has better convergence performance compared with existing schemes.
引用
收藏
页码:984 / 993
页数:10
相关论文
共 32 条
[1]  
Akyildiz I. F., 2010, PHYS COMMUN-AMST, V4, P217, DOI DOI 10.1016/J.PHYCOM.2010.08.001
[2]   Intelligent Device-to-Device Communication in the Internet of Things [J].
Bello, Oladayo ;
Zeadally, Sherali .
IEEE SYSTEMS JOURNAL, 2016, 10 (03) :1172-1182
[3]   Game Theoretic Dynamic Channel Allocation for Frequency-Selective Interference Channels [J].
Bistritz, Ilai ;
Leshem, Amir .
IEEE TRANSACTIONS ON INFORMATION THEORY, 2019, 65 (01) :330-353
[4]  
Buzdalov M, 2015, IEEE C EVOL COMPUTAT, P1776, DOI 10.1109/CEC.2015.7257102
[5]   Learning-Based Constraint Satisfaction With Sensing Restrictions [J].
Checco, Alessandro ;
Leith, Douglas J. .
IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2013, 7 (05) :811-820
[6]   Performance Analysis of D2D and Cellular Coexisting Networks With Interference Management [J].
Chen, Libo ;
Deng, Na ;
Wei, Haichao .
IEEE ACCESS, 2020, 8 :82747-82759
[7]   Joint Mode Selection and Resource Allocation for D2D-Enabled NOMA Cellular Networks [J].
Dai, Yanpeng ;
Sheng, Min ;
Liu, Junyu ;
Cheng, Nan ;
Shen, Xuemin ;
Yang, Qinghai .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (07) :6721-6733
[8]   Device-to-Device Communications Underlaying Cellular Networks [J].
Feng, Daquan ;
Lu, Lu ;
Yi Yuan-Wu ;
Li, Geoffrey Ye ;
Feng, Gang ;
Li, Shaoqian .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2013, 61 (08) :3541-3551
[9]   Smart Mode Selection Using Online Reinforcement Learning for VR Broadband Broadcasting in D2D Assisted 5G HetNets [J].
Feng, Lei ;
Yang, Zhixiang ;
Yang, Yang ;
Que, Xiaoyu ;
Zhang, Kai .
IEEE TRANSACTIONS ON BROADCASTING, 2020, 66 (02) :600-611
[10]  
Jänis P, 2009, VEH TECHNOL CONFE, P2294