Optimizing Relay Selection in D2D Communication for Next-Generation Wireless Networks Using Multi-Agent Reinforcement Learning: A Novel Approach

被引：0

作者：

Muharrem Sirma ^{[1
]}

Adnan Kavak ^{[1
]}

A. Burak Inner ^{[1
]}

机构：

[1] Kocaeli University,Department of Computer Engineering, Artificial Intelligence and Simulation Systems Research Lab, Wireless Communication and Information Systems Research Center

来源：

Wireless Personal Communications | 2025年 / 140卷 / 3期

关键词：

5G; Artificial intelligence; D2D; Internet of everything; Machine learning; Multi-agent; Next-generation network; Reinforcement learning; Relay selection;

D O I：

10.1007/s11277-025-11753-z

中图分类号：

学科分类号：

摘要：

Device-to-device (D2D) communication offering a direct communication channel between devices has been introduced as an alternative communication technique for next-generation wireless networks, which aims to alleviate the workload traditionally managed by base stations (BS) in cellular systems. However, D2D pairs encountering connectivity issues or requiring extended communication ranges require the involvement of a relay node (RN) to facilitate communication. For cases in which efficient communication with the target device is only possible over a relay device, finding the candidate relay among source-relay-destination devices with the best link availability and optimized end-to-end throughput are challenges to be considered in relay-assisted D2D communication. Despite the plethora of studies on relay selection in D2D communication, there is a need for a method that systematically integrates multiple disruptive factors inherent in wireless channels. Although reinforcement learning (RL) has primarily been applied in resource management tasks such as power control, resource block (RB) availability, and spectrum allocation, its application for finding the optimum relay in D2D communication in dynamic wireless environments remains largely unexplored. In this paper, we propose the use of multi-agent reinforcement learning-based relay selection (MARS) in which the source device and/or pairing devices can function as learning agents. The resource selection agent (RSA), link agent (LA), and transmission agent (TA) are involved cooperatively in the MARS method to determine the optimum relay. Source nodes in D2D pairs iteratively update their strategies through interactions with the wireless environment and other devices to maximize the cost function to select the most convenient relay in the multi-hop D2D communication scenario. The MARS method considers the combined effect of the SINR, link reliability, and throughput values for the estimation of cost function in order to select the optimum relay node. We have performed extensive simulations for different device density scenarios in a wireless environment and compared the performance of the MARS method with that of the SINR-based relay selection approach and other RL-based methods. The results demonstrate that the proposed MARS approach outperform the SINR-based method in terms of end-to-end link reliability and throughput performance. Compared to existing RL-based methods, the MARS approach exhibits some advantages in terms of optimizing the SINR, throughput, and link reliability with a moderate level of complexity.

引用

页码：945 / 969

页数：24