Deep Reinforcement Learning for Distributed Dynamic MISO Downlink-Beamforming Coordination

被引：60

作者：

Ge, Jungang ^{[1
]}

Liang, Ying-Chang ^{[1
]}

Joung, Jingon ^{[2
]}

Sun, Sumei ^{[3
]}

机构：

[1] Univ Elect Sci & Technol China UESTC, Ctr Intelligent Networking & Commun CINC, Chengdu 611731, Peoples R China

[2] Chung Ang Univ, Sch Elect & Elect Engn, Seoul 06974, South Korea

[3] Inst Infocomm Res, Singapore 138632, Singapore

来源：

IEEE TRANSACTIONS ON COMMUNICATIONS | 2020年 / 68卷 / 10期

基金：

新加坡国家研究基金会; 中国国家自然科学基金; 国家重点研发计划;

关键词：

Downlink; Intercell interference; Wireless communication; MISO communication; Data communication; Reinforcement learning; Downlink-beamforming coordination; multi-input single-output (MISO) interference channel; deep reinforcement learning; interference mitigation; SUM-RATE MAXIMIZATION; POWER-CONTROL; NETWORKS; SYSTEMS;

D O I：

10.1109/TCOMM.2020.3004524

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

We consider a homogeneous cellular network where a multi-antenna base station (BS) in each cell transmits messages to its intended user over a common frequency band. To improve the system capacity of this multi-cell multi-input single-output (MISO) interference channel, one of the state-of-the-art algorithms, namely, downlink-beamforming coordination, allows all BSs to cooperate with one another to mitigate the effect of inter-cell interference. However, most existing algorithms are suboptimal and impractical in a dynamic wireless environment, due to the high computational complexity and the overhead involved in collecting global channel state information (CSI). In this study, we exploit deep reinforcement learning (DRL) and propose a distributed dynamic downlink-beamforming coordination (DDBC) method with partial observability of the CSI. Each BS is able to train its own deep Q-network and employs appropriate beamformer depending on its environment, which is observed through a designed limited-information exchange protocol. The simulation results show that the proposed DRL-based DDBC method, with a considerably lower system overhead, achieves a system capacity that is very close to that of the fractional programming algorithm with global and instantaneous CSI measurements. In addition, this work demonstrates the potential of utilizing DRL to solve DDBC problems in a more practical manner.

引用

页码：6070 / 6085

页数：16

共 40 条

[1]

[Anonymous], 2015, Deep learn. nat., DOI [10.1038/nature14539, DOI 10.1038/NATURE14539]

[2]

[Anonymous], 2017, ARXIV170709183

[3]

Cybenko G., 1989, Mathematics of Control, Signals, and Systems, V2, P303, DOI 10.1007/BF02551274

[4] Coordinated Beamforming for the Multicell Multi-Antenna Wireless System [J].