Deep Reinforcement Learning Based AoI Minimization for NOMA-Enabled Integrated Satellite-Terrestrial Networks

被引：0

作者：

He, Xinyu ^{[1
]}

Yang, Yang ^{[1
]}

Lee, Jemin ^{[2
]}

He, Gang ^{[1
]}

Yan, Qing ^{[1
]}

机构：

[1] Beijing Univ Posts & Telecommun, Sch Artificial Intelligence, Beijing 100876, Peoples R China

[2] Yonsei Univ, Sch Elect & Elect Engn, Seoul 03722, South Korea

来源：

IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY | 2025年 / 74卷 / 02期

基金：

海南省自然科学基金; 中国国家自然科学基金;

关键词：

Satellites; Optimization; NOMA; Low earth orbit satellites; Resource management; Receivers; Minimization; Training; Signal to noise ratio; Vectors; Integrated satellite-terrestrial networks; non-orthogonal multiple access; age of information; deep reinforcement learning; generalized advantage estimation;

D O I：

10.1109/TVT.2024.3472274

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Integrated satellite-terrestrial networks (ISTN) are regarded as a critical 6G technology capable of providing seamless global communication coverage. However, to provide more real-time and efficient communication services, information freshness is becoming increasingly important in the ISTN. In this paper, we investigate a multi-agent DRL-based power control algorithm for minimizing the average age of information (AoI) in the non-orthogonal multiple access (NOMA) enabled ISTN. Specifically, a multi-agent proximal policy optimization (PPO) is employed to jointly allocate the power and reduce inter-beam interference. Due to the imperfect serial interference cancellation (SIC), the average AoI minimization is formulated as a non-convex problem that satisfies both quality of service constraints and total power limitations. Therefore, a generalized advantage estimation PPO-based power allocation (GAP3A) algorithm is proposed, which operates by centralized training and decentralized execution (CTDE) to enhance the training stability. Especially, an ingenious, global reward function is designed to guide the DRL training towards minimizing the average AoI of the ISTN. Simulation results demonstrate that the proposed CTDE-based GAP3A algorithm can effectively lower the average AoI to an optimal level and significantly outperforms the other three benchmarks.

引用

页码：3567 / 3572

页数：6