Distributed Learning for Optimal Spectrum Access in Dense Device-to-Device Ad-Hoc Networks

被引：2

作者：

Boyarski, Tomer ^{[1
]}

Wang, Wenbo ^{[2
]}

Leshem, Amir ^{[1
]}

机构：

[1] Bar Ilan Univ, Fac Engn, IL-5290002 Ramat Gan, Israel

[2] Kunming Univ Sci & Technol KUST, Fac Mech & Elect Engn, Kunming 650500, Peoples R China

来源：

IEEE TRANSACTIONS ON SIGNAL PROCESSING | 2023年 / 71卷

关键词：

Resource management; Device-to-device communication; Quality of service; Ad hoc networks; Time-frequency analysis; Signal processing algorithms; Multiaccess communication; Multi-agent multi-armed bandit; D2D networks; resource allocation; distributed network management; MULTIARMED BANDIT; RESOURCE-ALLOCATION; MODEL;

D O I：

10.1109/TSP.2023.3300630

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In 5G networks, Device-to-Device (D2D) communications aim to provide dense coverage without relying on the cellular network infrastructure. To achieve this goal, the D2D links are expected to be capable of self-organizing and allocating finite, interfering resources with limited inter-link coordination. We consider a dense ad-hoc D2D network and propose a decentralized time-frequency allocation mechanism that achieves sub-linear social regret toward optimal spectrum efficiency. The proposed mechanism is constructed in the framework of multi-agent multi-armed bandits, which employs the carrier-sensing-based distributed auction to learn the optimal allocation of time-frequency blocks with different channel state dynamics from scratch. Our theoretical analysis shows that the proposed fully distributed mechanism achieves a logarithmic regret bound by adopting an epoch-based strategy-learning scheme when the length of the strategy-exploitation window is exponentially growing. We further propose an implementation-friendly protocol featuring a fixed exploitation window, which guarantees a good tradeoff between performance optimality and protocol efficiency. Numerical simulations demonstrate that the proposed protocol achieves higher efficiency than the prevalent reference algorithms in both static and dynamic wireless environments.

引用

页码：3149 / 3163

页数：15

共 38 条

[1] Alatur P, 2020, J MACH LEARN RES, V21
[2] Alon N., 2016, WILEY SERIES DISCRET
[3] Auer P, 2003, SIAM J COMPUT, V32, P48, DOI 10.1137/S0097539701398375
[4] Multi-User Communication Networks: A Coordinated Multi-Armed Bandit Approach
Avner, Orly
Mannor, Shie
[J]. IEEE-ACM TRANSACTIONS ON NETWORKING, 2019, 27 (06) : 2192 - 2207
[5] Bar-On Yogey, 2019, Advances in Neural Information Processing Systems, P3116
[6] Bertsekas D.P., 1979, Lab. for Information and Decision Systems Working Paper
[7] Bistritz Ilai, 2021, IEEE Journal on Selected Areas in Information Theory, V2, P584, DOI 10.1109/JSAIT.2021.3073065
[8] Bistritz I., 2020, INT C MACHINE LEARNI, P930
[9] Game of Thrones: Fully Distributed Learning for Multiplayer Bandits
Bistritz, Ilai
Leshem, Amir
[J]. MATHEMATICS OF OPERATIONS RESEARCH, 2021, 46 (01) : 159 - 178
[10] Boursier E, 2020, PR MACH LEARN RES, V108, P1211

← 1 2 3 4 →