Learning-based resource allocation in D2D communications with QoS and fairness considerations

被引：6

作者：

Rashed, Salma Kazemi ^{[1
]}

Shahbazian, Reza ^{[1
]}

Ghorashi, Seyed Ali ^{[1
,2
]}

机构：

[1] Shahid Beheshti Univ, Dept Elect Engn, Cognit Telecommun Res Grp, Tehran, Iran

[2] Shahid Beheshti Univ, Cyberspace Res Inst, Tehran, Iran

来源：

TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES | 2018年 / 29卷 / 01期

关键词：

SELECTION; SPECTRUM; CHANNEL; NETWORKS; POLICY; MODE;

D O I：

10.1002/ett.3249

中图分类号：

TN [电子技术、通信技术];

学科分类号：

0809 ;

摘要：

In device-to-device (D2D) communications, D2D users establish a direct link by utilizing the cellular users' spectrum to increase the network spectral efficiency. However, due to the higher priority of cellular users, interference imposed by D2D users to cellular ones should be controlled by channel and power allocation algorithms. Due to the unknown distribution of dynamic channel parameters, learning-based resource allocation algorithms work more efficient than classic optimization methods. In this paper, the problem of the joint channel and power allocation for D2D users in realistic scenarios is formulated as an interactive learning problem, where the channel state information of selected channels is unknown to the decision center and learned during the allocation process. In order to achieve the maximum reward function by choosing an action (channel and power level) for each D2D pair, a recency-based Q-learning method is introduced to find the best channel-power for each D2D pair. The proposed method is shown to achieve logarithmic regret function asymptotically, which makes it an order optimal policy, and it converges to the stable equilibrium solution. The simulation results confirm that the proposed method achieves better responses in terms of network sum rate and fairness criterion in comparison with conventional learning methods and random allocation.

引用

页数：20

共 39 条

[11] Belleschi M, 2011, IEEE GLOBE WORK, P358, DOI 10.1109/GLOCOMW.2011.6162471
[12] Chien C, 2012, INT S WIR PERS MULT
[13] Resource allocation and congestion control in clustered M2M communication using Q-learning
Hussain, Fatima
Anpalagan, Alagan
Khwaja, Ahmed Shaharyar
Naeem, Muhammad
[J]. TRANSACTIONS ON EMERGING TELECOMMUNICATIONS TECHNOLOGIES, 2017, 28 (04):
[14] Jung M, 2012, VEH TECHN C VTC SPR
[15] Optimal, heuristic and Q-learning based DSA policies for cellular networks with coordinated access band
Kamal, Hany
Coupechoux, Marceau
Godlewski, Philippe
Kelif, Jean-Marc
[J]. EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, 2010, 21 (08): : 694 - 703
[16] ASYMPTOTICALLY EFFICIENT ADAPTIVE ALLOCATION RULES
LAI, TL
ROBBINS, H
[J]. ADVANCES IN APPLIED MATHEMATICS, 1985, 6 (01) : 4 - 22
[17] Learning-Aided Sub-Band Selection Algorithms for Spectrum Sensing in Wide-Band Cognitive Radios
Li, Yang
Jayaweera, Sudharman K.
Bkassiny, Mario
Ghosh, Chittabrata
[J]. IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2014, 13 (04) : 2012 - 2024
[18] Reinforcement-learning-based self-organisation for cell configuration in multimedia mobile networks
Liao, CY
Yu, F
Leung, VCM
Chang, CJ
[J]. EUROPEAN TRANSACTIONS ON TELECOMMUNICATIONS, 2005, 16 (05): : 385 - 397
[19] Device-to-Device Communication in LTE-Advanced Networks: A Survey
Liu, Jiajia
Kato, Nei
Ma, Jianfeng
Kadowaki, Naoto
[J]. IEEE COMMUNICATIONS SURVEYS AND TUTORIALS, 2015, 17 (04): : 1923 - 1940
[20] Luo Y, 2014, INT COMP C WAV ACT M

← 1 2 3 4 →