Multi-Agent Deep Reinforcement Learning Based Spectrum Allocation for D2D Underlay Communications

被引：124

作者：

Li, Zheng ^{[1
]}

Guo, Caili ^{[2
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing 100876, Peoples R China

[2] Beijing Univ Posts & Telecommun, Sch Informat & Commun Engn, Beijing Key Lab Network Syst Architecture & Conve, Beijing 100876, Peoples R China

来源：

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY | 2020年 / 69卷 / 02期

基金：

国家重点研发计划; 中国国家自然科学基金;

关键词：

Device-to-device (D2D) communications; multi-agent deep reinforcement learning; spectrum allocation; RESOURCE-ALLOCATION; ALGORITHM; NETWORKS; SCHEME;

D O I：

10.1109/TVT.2019.2961405

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Device-to-device (D2D) communication underlay cellular networks is a promising technique to improve spectrum efficiency. In this situation, D2D transmission may cause severe interference to both the cellular and other D2D links, which imposes a great technical challenge to spectrum allocation. Existing centralized schemes require global information, which causes a large signaling overhead. While existing distributed schemes requires frequent information exchange among D2D users and cannot achieve global optimization. In this paper, a distributed spectrum allocation framework based on multi-agent deep reinforcement learning is proposed, named multi-agent actor critic (MAAC). MAAC shares global historical states, actions and policies during centralized training, requires no signal interaction during execution and utilizes cooperation among users to further optimize system performance. Moreover, in order to decrease the computing complexity of the training, we further propose the neighbor-agent actor critic (NAAC) based on the neighbor users' historical information for centralized training. The simulation results show that the proposed MAAC and NAAC can effectively reduce the outage probability of cellular links, greatly improve the sum rate of D2D links and converge quickly.

引用

页码：1828 / 1840

页数：13

共 42 条

[1]

[Anonymous], 2010, ADV E UTRA PHYS LAYE

[2]

[Anonymous], 1998, INTRO REINFORCEMENT

[3]

[Anonymous], [No title captured]

[4]

Asheralieva A, 2016, I S INTELL SIG PROC, P291

[5] An Autonomous Learning-Based Algorithm for Joint Channel and Power Level Selection by D2D Pairs in Heterogeneous Cellular Networks [J].

Asheralieva, Alia ;

Miyanaga, Yoshikazu .

IEEE TRANSACTIONS ON COMMUNICATIONS, 2016, 64 (09) :3996-4012

[6] Learning-Based Constraint Satisfaction With Sensing Restrictions [J].

Checco, Alessandro ;

Leith, Douglas J. .

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 2013, 7 (05) :811-820

[7] Distributed Resource Allocation for D2D Communications Underlaying Cellular Networks in Time-Varying Environment [J].

Dominic, Susan ;

Jacob, Lillykutty .

IEEE COMMUNICATIONS LETTERS, 2018, 22 (02) :388-391

[8] Device-to-Device Communications Underlaying Cellular Networks [J].

Feng, Daquan ;

Lu, Lu ;

Yi Yuan-Wu ;

Li, Geoffrey Ye ;

Feng, Gang ;

Li, Shaoqian .

IEEE TRANSACTIONS ON COMMUNICATIONS, 2013, 61 (08) :3541-3551

[9] Distributed Resource Allocation for D2D Communications Underlay Cellular Networks [J].

Hoang-Hiep Nguyen ;

Hasegawa, Mikio ;

Hwang, Won-Joo .

IEEE COMMUNICATIONS LETTERS, 2016, 20 (05) :942-945

[10] MACHINE LEARNING PARADIGMS FOR NEXT-GENERATION WIRELESS NETWORKS [J].

Jiang, Chunxiao ;

Zhang, Haijun ;

Ren, Yong ;

Han, Zhu ;

Chen, Kwang-Cheng ;

Hanzo, Lajos .

IEEE WIRELESS COMMUNICATIONS, 2017, 24 (02) :98-105

← 1 2 3 4 5 →