S-MFRL: Spiking Mean Field Reinforcement Learning for Dynamic Resource Allocation of D2D Networks

被引:6
作者
Ye, Pei-Gen [1 ]
Wang, Yuan-Gen [1 ]
Tang, Weixuan [2 ]
机构
[1] Guangzhou Univ, Sch Comp Sci & Cyber Engn, Guangzhou 510006, Peoples R China
[2] Guangzhou Univ, Inst Artificial Intelligence & Blockchain, Guangzhou 510006, Peoples R China
基金
中国国家自然科学基金;
关键词
Device-to-device communication; Resource management; Reinforcement learning; Optimization; Copper; Interference; Training; Channel selection; deep reinforcement learning; device-to-device; multi-agent reinforcement learning; power con-trol; spiking neural networks; POWER ALLOCATION; STEGANOGRAPHY; SPECTRUM; GAME;
D O I
10.1109/TVT.2022.3203050
中图分类号
TM [电工技术]; TN [电子技术、通信技术];
学科分类号
0808 ; 0809 ;
摘要
Device-to-device (D2D) technology has been widely used to address the mobile traffic explosion problem due to its ability of direct communications between proximal devices. In practice, the available spectrum is often limited. With the rapid increase of D2D users and cellular users, the efficiency of resource allocation would be dramatically reduced. To overcome the above limitations, this paper proposes a spiking mean field reinforcement learning framework (S-MFRL) to optimize the resource allocation of D2D networks. Firstly, spiking neural network (SNN) cooperated with deep reinforcement learning is trained for channel selection and power control. Secondly, spatio-temporal backpropagation is adopted to accelerate the SNN training. Thirdly, mean field multi-agent reinforcement learning (MFRL) is applied to approximate interactions among D2D users. By this means, the optimization process of resource allocation becomes tractable as the number of D2D users increases, which solves the problem of exponential growth of user interactions. Two algorithms are implemented under the S-MFRL framework by integrating MFRL into spiking actor-critic (S-AC) and spiking proximal policy optimization (S-PPO), which are named S-MFAC and S-MFPPO, respectively. Experimental results show that our designed S-MFAC and S-MFPPO outperform both AC and PPO in terms of convergence rate, access rate, time-averaged overall throughput, and collision probability. Besides, further simulations have been conducted to verify the effectiveness of the proposed algorithm in the case of larger action space and hundreds of D2D users.
引用
收藏
页码:1032 / 1047
页数:16
相关论文
共 41 条
[1]   Distributed Power Allocation for D2D Communications Underlaying/Overlaying OFDMA Cellular Networks [J].
Abrardo, Andrea ;
Moretti, Marco .
IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2017, 16 (03) :1466-1479
[2]  
Al-Yasari MMR, 2018, IARJSET, V5, P33, DOI 10.17148/iarjset.2018.5105
[3]   QoS-Oriented Mode, Spectrum, and Power Allocation for D2D Communication Underlaying LTE-A Network [J].
Asheralieva, Alia ;
Miyanaga, Yoshikazu .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2016, 65 (12) :9787-9800
[4]   A comprehensive survey of multiagent reinforcement learning [J].
Busoniu, Lucian ;
Babuska, Robert ;
De Schutter, Bart .
IEEE TRANSACTIONS ON SYSTEMS MAN AND CYBERNETICS PART C-APPLICATIONS AND REVIEWS, 2008, 38 (02) :156-172
[5]   Device-to-Device Communications in Cellular Networks [J].
Feng, Daquan ;
Lu, Lu ;
Yi Yuan-Wu ;
Li, Geoffrey Ye ;
Li, Shaoqian ;
Feng, Gang .
IEEE COMMUNICATIONS MAGAZINE, 2014, 52 (04) :49-55
[6]   Device-to-Device Communications Underlaying Cellular Networks [J].
Feng, Daquan ;
Lu, Lu ;
Yi Yuan-Wu ;
Li, Geoffrey Ye ;
Feng, Gang ;
Li, Shaoqian .
IEEE TRANSACTIONS ON COMMUNICATIONS, 2013, 61 (08) :3541-3551
[7]  
Foerster JN, 2018, AAAI CONF ARTIF INTE, P2974
[8]  
Guo Y., 2019, P IEEE GLOB COMM C, P1
[9]   Multi-agent reinforcement learning for cost-aware collaborative task execution in energy-harvesting D2D networks [J].
Huang, Binbin ;
Liu, Xiao ;
Wang, Shangguang ;
Pan, Linxuan ;
Chang, Victor .
COMPUTER NETWORKS, 2021, 195
[10]   Energy-Efficient Joint Resource Allocation and Power Control for D2D Communications [J].
Jiang, Yanxiang ;
Liu, Qiang ;
Zheng, Fuchun ;
Gao, Xiqi ;
You, Xiaohu .
IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2016, 65 (08) :6119-6127