Hierarchical Reinforcement Learning for Relay Selection and Power Optimization in Two-Hop Cooperative Relay Network

被引：31

作者：

Geng, Yuanzhe ^{[1
]}

Liu, Erwu ^{[1
]}

Wang, Rui ^{[1
,2
]}

Liu, Yiming ^{[1
]}

机构：

[1] Tongji Univ, Coll Elect & Informat Engn, Shanghai 201804, Peoples R China

[2] Tongji Univ, Shanghai Inst Intelligent Sci & Technol, Shanghai 201804, Peoples R China

来源：

IEEE TRANSACTIONS ON COMMUNICATIONS | 2022年 / 70卷 / 01期

基金：

美国国家科学基金会;

关键词：

Relays; Resource management; Probability; Power system reliability; Optimization; Signal to noise ratio; Relay networks (telecommunication); Cooperative communication; outage probability; relay selection; power allocation; hierarchical reinforcement learning; RESOURCE-ALLOCATION; DIVERSITY;

D O I：

10.1109/TCOMM.2021.3119689

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

In this paper, we study the outage probability minimizing problem in a two-hop cooperative relay network. To reduce outage probability, existing studies propose many schemes for relay selection and power allocation, which are usually based on the assumption of exact channel state information (CSI). However, it is difficult to obtain perfect instantaneous CSI in practical situations where channel states change rapidly, and thus traditional methods would not perform well. Considering these factors, we turn to the emerging reinforcement learning (RL) methods for solutions. RL methods do not need any prior knowledge of CSI, but use neural network for approximation and decision after interacting with communication environment. Nevertheless, conventional RL methods, including most deep reinforcement learning (DRL) methods, cannot perform well when the search space is too large. In addition, non-stationarity is a common problem when using hierarchical reinforcement learning (HRL), which is caused by the changing behavior in different hierarchies. Therefore, we first propose a DRL framework with an outage-based reward function, which is then used as a baseline. Then, we further design an HRL framework and training algorithm. By decomposing relay selection and power allocation into two hierarchical optimization objectives, and combining on- policy and off-policy methods in the HRL framework, our method successfully address the sparse reward and non-stationary problem. Simulation results reveal that compared with traditional DRL method, the proposed HRL training algorithm can converge faster and reduce the outage probability by 8% in two-hop relay network with the same outage threshold.

引用

页码：171 / 184

页数：14

共 43 条

[1] On the Performance Analysis of Multirelay Cooperative Diversity Systems With Channel Estimation Errors [J].

Amin, Osama ;

Ikki, Salama Said ;

Uysal, Murat .

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2011, 60 (05) :2050-2059

[2] Statistical channel knowledge-based optimum power allocation for relaying protocols in the high SNR regime [J].

Annavajjala, Ramesh ;

Cosman, Pamela C. ;

Milstein, Laurence B. .

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2007, 25 (02) :292-305

[3] Joint Power and Time Allocation for Two-Way Cooperative NOMA [J].

Bae, Jimin ;

Han, Youngnam .

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (12) :12443-12447

[4] A simple cooperative diversity method based on network path selection [J].

Bletsas, A ;

Khisti, A ;

Reed, DP ;

Lippman, A .

IEEE JOURNAL ON SELECTED AREAS IN COMMUNICATIONS, 2006, 24 (03) :659-672

[5] Multihop diversity in wireless relaying channels [J].

Boyer, J ;

Falconer, DD ;

Yanikomeroglu, H .

IEEE TRANSACTIONS ON COMMUNICATIONS, 2004, 52 (10) :1820-1830

[6] Joint Noisy Network Coding and Decode-Forward Relaying for Non-Orthogonal Multiple Access [J].

Chattha, Jawwad Nasar ;

Uppal, Momin .

IEEE TRANSACTIONS ON WIRELESS COMMUNICATIONS, 2019, 18 (01) :296-309

[7]

Chen Z., 2018, DECENTRALIZED COMPUT

[8] Secrecy Analysis of Modify-and-Forward Relaying With Relay Selection [J].

Chu, Shao-I .

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2019, 68 (02) :1796-1809

[9] Feature Control as Intrinsic Motivation for Hierarchical Reinforcement Learning [J].

Dilokthanakul, Nat ;

Kaplanis, Christos ;

Pawlowski, Nick ;

Shanahan, Murray .

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2019, 30 (11) :3409-3418

[10] Buffer-Aided Max-Link Relay Selection for Multi-Way Cooperative Multi-Antenna Systems [J].

Duarte, F. L. ;

de Lamare, R. C. .

IEEE COMMUNICATIONS LETTERS, 2019, 23 (08) :1423-1426

← 1 2 3 4 5 →